The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Data dumping

*** This question is posted on other forums too ***

I am developing an office automation software for a department. Initially, I am working on the payroll part. In a table I am storing salary details of every month in month/year-wise format. I want to know that in the long run the size of the table shall increase considerably. Some guys suggested me to create the table year-wise, for example a table of year 2005, a table of year 2006 and so on. But this approach is cumbersome. When I need to fetch records of multiple years, I need to first join the respective tables and also search the tables with their names.

Please suggest a better way where the old records can be dumped in a suitable manner, and when necessary can be used for retrieval.
RK Send private email
Monday, August 13, 2007
What exactly do you mean by "dumping", and why do you care if the tables get big?

The payroll tables for even the world's largest employer (IIRC, WalMart with 500K employees) * 26 pay periods * a few K per pay adds up to well under a GB (unless my math is screwed up this early in the morning).
Greg Send private email
Monday, August 13, 2007
If you really need to, I would look into table partitioning, yes, but it should be built transparent to your client-side app using views or other tools supported by your database engine layer.

Are you sure you need to do this?

Remember the first rule of optimization, DON'T.
Cade Roux Send private email
Monday, August 13, 2007
Whoever suggested creating one table per year is an idiot, don't do it, and take no further advice from them (or take the advice as a caution).

The standard solution to this sort of problem is to have two tables: a current active table where new entries are added, and an historical archive table where old entries go after some period of time. Entries are first entered into the active table and live there for some period (usually a few weeks or months). Every so often a job is run that copies old entries from the active table into the archive table and deletes those entries from the active table.

Usually you will only query against the active table (you make sure that the active retention period is long enough to ensure this), but, even when you need to query against the historical data as well, it is only one more table to query (rather than N, as in the "one table per year" suggestion).
Jeffrey Dutky Send private email
Monday, August 13, 2007
"In a table I am storing salary details of every month in month/year-wise format."

I'm confused.

Let's say I get a pay raise every October. This year, I jump from $50,000 to $55,000. I'm also in an industry that gets 26 paychecks a year. That means in this hypothetical table, I'll have 19 rows for which my salary is $50k? and 7 more rows for which my salary is $55k? Keep in mind that the expression "Salary" has a fairly specific meaning in financial/payroll terms.

If this is indeed how you're structuring your table, then I can see why you and your peers are concerned about having too many rows in the table after a while.

I'd go back and make sure you really understand the requirements and think about possibily normalizing the table or calculating some values on the fly, in the application layer before I worried about creating a table for each year or setting up an archival scheme.

As a general rule of thumb, you should not have to optimize any sort of modern day database (MySQL, SQL Server, Oracle, Postgres, etc) unless you have well over a million records, or you're retrieving a megabyte or more's worth of data per record. Not a hard and fast limit but the idea is to worry about optimization if you're building for WalMart as opposed to the corner Ma and Pa pizza place.
Monday, August 13, 2007

May be you are right and wrong both. I must not use the name SalaryTable. Wrong in the sense, that I am not storing that way. Please look below the structure of the table:

PayMonth  EmpCode  Detail    Amount  RecordDate
--------  -------  ---------- -------  -----------
Jan-2007  101      Dea.Allow. 100      26-Jan-2007
Jan-2007  101      Off.Allow. 200      26-Jan-2007
Jan-2007  101      Ot.Deduc.  50      26-Jan-2007

Jan-2007  102      Dea.Allow. 100      26-Jan-2007
Jan-2007  102      Off.Allow  60      26-Jan-2007     
Jan-2007  102      Ot.Deduc.  30      26-Jan-2007

So the table is storing the Pay Slip columns. In case there is an increment one time a year, then I have created a table called IncrementMaster where Amount and DueDate are the columns. So Increment Name and Amount will also get stored as a row.

What's wrong here? What you suggest?
RK Send private email
Tuesday, August 14, 2007
Any particular reason why each "detail" is a separate entry? I admit it makes sense to do it that way when you don't know how many deductions and allowances an employee might take. However, I also think there's a reasonable limit of 15 allowances and 15 deductions. I admit I may be wrong but I suspect you'll only need one or two MEDICAL_DEDUCTION_TYPE and MEDICAL_DEDUCTION_AMOUNT set of columns. It depends on whether the Human Resources department allows an individual to subscribe to multiple health plans.

And yes, 60 columns in a table is common and acceptable. It's a lot better than having 60 rows for each and every employee.

If the increment table is the salary raise table, I think you'll find it a lot easier actually to have a base pay table. Every time they get a raise, add a new record to the table, flag that one as active, and flag previous records as inactive.

So in my case, it'll look something like this...

emp_code  base  date      active?
101      50000 9/30/2003 N
101      52500 9/29/2004 N
101      55125 9/29/2005 N
101      57882 9/30/2006 Y
Tuesday, August 14, 2007

Keeping 15 fixed columns will facilitate, but it is not scalable. The payroll rules vary from department to department. Some departments in our country give departmental allowance along with standard allowance. I am trying to make a standard product that is scalable according to requirements of any department.

Even if the rows exceed, it is better than columns. Because if I fix the columns and later a deduction increases, I need to modify the table itself. Which will make this field NULL for older records.
RK Send private email
Tuesday, August 14, 2007
"I am developing an office automation software for a department."

So you're not really writing code for your department but you're trying to create a product that you can sell to any company?

That is a substantially different problem in terms of requirements. To put it another way, you're not building a payroll system. You're trying to give other people the ability to build their own payroll systems.

To illustrate the problem, consider that Company A may want to keep payroll records for one year. Company B may want to keep payroll records going back five years.  Company C may want to keep three years worth of deductions and two years of allowances. Company D may want to save seven years of tax deductions, but only two years of medical and dental.

If you want to allow that, then yes, you should just use a single table that captures all of the data much in the same way a spreadsheet works, and then write lots of code to parse the data depending on the Company's rules.

I personally think you will find it a lot easier if you set some rules yourself, such as limiting a Company to no more than five deductions per employee, and you will only save them going back three years.
Tuesday, August 14, 2007
I'm sorry what I wrote in original post.

Actually, when I started designing payroll part, I decided that I can make it flexible so that I can sell it as a packaged product. Though at present I am developing it for a department, but I will keep the payroll part independent and attach other parts of this department in a separate app.
RK Send private email
Tuesday, August 14, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz