The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Architecture Suggestions - WinForms app handling large data

I am planning to develop a Windows Forms (using VB.NET) application which will require handling large amounts of data - 600K records per month, around 30 columns and around 30 months of data. The data will be stored as SAS ( datasets which I will be reading using an ODBC driver provided by SAS.
The application will require large amounts of data manipulation - lot of summaries but also row by row processing in some cases.
I am thinking about how to architect such an application. Should I use an embedded database and let SQL do the job wherever applicable, or are the inbuilt .NET data controls good enough to do the job.
(For those of who you are familiar with SAS, I will not have access to power of the SAS system to do the manipulations - I will only be having the datasets).

Any suggestions? Thanks in advance.
Anindya Mozumdar Send private email
Thursday, September 25, 2008
I'm not convinced that 18m rows is "large", but that is because I did a carrier billing system for a global telecoms firm (18m rows was a quiet day).

It all depends on the kinds of manipulations you need and the trade-off between required performance and hardware.

If you are doing lots of repeated aggregates, then pre-calculating them and storing them as aggregate tables can make a big difference. For example, store daily summaries for your records, or monthly, whatever to reflect your usage. Then, you can query these rather than the underlying data each time.

Without knowing more about your application and environment, it is hard to be more specific.
Scorpio Dragon Send private email
Thursday, September 25, 2008
If you do use a db, then you've got the added burden of getting it into the db to process it.  If you use some sort of SQL Server variant then I would use SqlBulkCopy that allows you to quickly and easily load a sql table from a DataTable or DataReader.  Otherwise you'll have to build your own inserts to get the data into the database.
Thursday, September 25, 2008
You should have them implement this stuff in SAS and then make it available to you (regardless of the fact you said they won't).

It doesn't really matter how many records are going to be created per month, but how many you're going to have to perform these operation on at a time. If you have to summarize a full years worth of data in real time, that is going to be a killer.

As was previously suggested, you should pre-calculate as much as possible and store that.

This is the approach you should probably use if SAS really can't be leveraged (which is stupid):

SAS -> Big Ass DB Server You Control which replicates with SAS periodically <-> Big Ass Middle Tier Application Server to perform bulk processing (if required)/calculations not performed in database/cached computationally expensive data <-> WinForm Client(s) to display data

Thursday, September 25, 2008
> Should I use an embedded database and let SQL do the job wherever applicable, or are the inbuilt .NET data controls good enough to do the job.

My experience with win32 (not .NET) is that you shouldn't do a lot of processing in UI controls ... for example, don't do this:

* Put 18m items in a UI control
* Iterate through each item in the control, doing data processing on each item

Instead ...

* Put 18m items in an in-RAM data collection in your application
* Process your data in this collection
* Have the UI control display its data from this collection
Christopher Wells Send private email
Friday, September 26, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz