The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

how to calc running totals in accounts more efficiently

our current system goes back through 1 years worth of transactions and calculates the current, overdue, really overdue balances, etc

and it runs quite slow

we thought about adding triggers to the database to update each clients totals when a transaction/invoice/receipt is entered.

i am not sure if this is the best solution or will actually generate a lot more traffic for the database, cause now it will generate client totals with every transaction (several per client per day)

how do other people maintain totals like this ?

thanks

alex
bumperbox
Thursday, June 19, 2008
 
 
1)  Profile first but...

2)  ... counter/total caches should work amazingly well for this, assuming you can solve the issues with concurrent update of them (not hard for most cases, and your description leads me to believe you probably don't have them).

And "several extra operations per client per day" does not scare me at all...  Pretend Bob's total sales in the last year, over several thousand transactions, are $402,123.95 .  A new transaction for $23.01 comes in.  Updating the running count of Bob's sales is constant time (no need to retotal the previous total) and so fast it doesn't even bear thinking about.
Patrick McKenzie (Bingo Card Creator) Send private email
Thursday, June 19, 2008
 
 
First, some questions: how many customers are in the database (you said that there are several transactions per day per customer, but you didn't mention the number of customers)? when you say it is "running really slowly" can you quantify that?

I have worked with systems that keep running balances on customer accounts, and it was always done by running through all accounts (or, at least, all accounts billing on the current day) in batch mode, accumulating every transaction for the previous month. This was done on a database of approximately 5-10 million customers (we had several clients for the same system, with different numbers of customers), each of which had, at least, half a dozen transactions per day, and the entire job ran in only a few hours. The accumulation was done against the balance from the previous month, which limited the total amount of we had to do.
Jeffrey Dutky Send private email
Thursday, June 19, 2008
 
 
"we thought about adding triggers to the database to update each clients totals when a transaction/invoice/receipt is entered. "

Bad idea. You are programming to an event which if lost is difficult to cause to happen again. I would never use triggers for something like this. I've worked with systems that did and it was very painful.

The batch approach is best and allows you to recaculate at any time. If you store off the results of a batch run along with a run date you don't have to go back and run through everything again. You only need to run through the records since the last run.
dood mcdoogle
Friday, June 20, 2008
 
 
+1 to patric and dood.
A small amendment to dood's idea instead of at need updating calculate the update at a fixed interval say once the total is a day/week/month old.  That way the workload to get the current total is light even if it hasn't been requested in a while.  You can set it up so the account's auto update are stagered and don't have to process the whole load at once.
Brian
Friday, June 20, 2008
 
 
I must have missed something, dood. If the whole accounting transaction is wrapped in a database transaction (and I'd argue that it should be), then why would there be a problem with a trigger?
Ron Porter Send private email
Friday, June 20, 2008
 
 
What if there is a bug in the code being triggered? This happens all the time. And I've seen cases where triggers simply don't fire for some unexpected reason. When this occurs, how do you "refire" the trigger to get your code to run? Well, sometimes you can't. So you've "missed the opportunity" and now need to have a manual way of adjusting the numbers. Also, triggers can fail mysteriously without giving you any indication that it has happened. So from a debugging perspective it can be very hard to know when a problem has even occurred.

I was involved with a system that used triggers like this about 10 years ago with a Sybase database. It was an absolute nightmare. The numbers were never right for various reasons and we ended up having to go back and manually adjust them with a batch anyway. Maybe other DBMS's have better trigger implementations. But you could never convince me that using triggers for mission critical calculations like this is a good idea.
dood mcdoogle
Friday, June 20, 2008
 
 
"I was involved with a system that used triggers like this about 10 years ago with a Sybase database. "

Just so we're clear - because you had an unexplainable error with a 10 year old database product, triggers are to be avoided at all cost.
Defective Monk Send private email
Friday, June 20, 2008
 
 
The "save total" approach is problematic because if multiple transactions try to update a total at the same time by adding on each transaction's amount to the current total then read/write consistency means that each transaction does not see the impact of the others.

To avoid this you have to guarantee that the sessions serialize on access to the totals. You can't have a sequence of operations such as ...

i) Session A reads $100 from total
ii) Session B reads $100 from total
iii) Session A adds $15 to total to make $115, and commits
iv) Session B adds $25 to total to make $125, and commits

The update from session A has been lost.



Anyway, another approach to this would be to scan the full history of the customer each time you need to calculate the total. This is only problematic if your data rows for the customer are scattered around a heap organized table so that you need multiple logical reads per row. If you stored your transactions in a hash cluster you would decrease the work to do this enormously (maybe only one logical io to retrieve all transactions), albeit at the expense of increased cost in adding new rows to the table and calculating non-customer oriented aggregates. However, snce you generally add data only once then query it multiple times you get a net benefit.
David Aldridge Send private email
Friday, June 20, 2008
 
 
"Just so we're clear - because you had an unexplainable error with a 10 year old database product, triggers are to be avoided at all cost. "

Yes. And also because of the other reasons that I have already stated. Did you not understand those and thus just chose to pick on one aspect of my post? Do I need to explain them to you again or would you like to give it the old college try and read my post again? Just let me know. I'll be here...
dood mcdoogle
Friday, June 20, 2008
 
 
> also because of the other reasons that I have already stated

I saw you state three reasons:

1) "What if there is a bug in the code being triggered?"
2) "triggers simply don't fire for some unexpected reason"
3) "triggers can fail mysteriously without giving you any indication"

In the first case, there's presumably an exception and the transaction is rolled back.

I don't know about the latter two cases, but it seems to me that triggers are implemented by the database because they're theoretically useful *and* supposed to be usable in practice.
Christopher Wells Send private email
Friday, June 20, 2008
 
 
The problem with triggers is that they work well in an ideal world. But the world isn't ideal.

1) Triggers run in response to a specific action (inserting a row into a database for example). If something happens and that trigger doesn't run correctly, then you have to figure out a way to re-trigger that action again. That's the hard part. For example, how do you re-run an insert trigger if the row has already been inserted into the database? You could delete the row and then reinsert it. But who knows what implications are involved with that approach. My goal here is not to try and answer the question of how you re-run an insert trigger. My goal is to show that by nature triggers are "triggered" and often can't be re-triggered again without understanding what impact doing so will have on the system.

2) Saying that you could just use a transaction so the original insert wouldn't have happened if the trigger fails isn't acceptable in all cases either. There could be multiple systems at play and some business cases dictate that the row should be written at all costs and a failed trigger should not cause the insert/update to fail. Imagine multiple triggers being run and disparate systems that relay on that row being written. Data loss is often not acceptable just because one business process (updating a balance) fails. If I spent 10 minutes purchasing something in my shopping cart and an insert trigger failing causes my order to be lost I'm going to be upset.

3) Debugging triggers is often quite difficult as is recognizing when a trigger fails. Triggers can sometimes fail silently resulting in data corruption.

4) Given that failures occur in real life, you will usually end up needed some external process to "fix" account balances anyway. Using existing trigger functionality is difficult because code that works fine at a trigger point often can't be run again as part of a batch process. This means that you end up with business logic in two processes. With that said it is usually easier to just maintain one set of code and use a batch process.

5) Triggers work for simple systems but when you need to update multiple data sources or peform a business workflow off of a trigger the error handling becomes problematic. Imagine having to set up a workflow that can be interrupted and then resumed off of a trigger. Yes, it can be done. But I'd personally rather do that using other mechanisms.

YMMV but I've been burned too many times by systems that have tried to use triggers to update accounts and such. The numbers were always off and we ended up having to write batch processes to clean up the mess anyway. Ultimately, we decided that using a single batch process was well worth the extra processing time since it could be run at any time and the results were always consistent. I've never used a system based on triggers that wasn't a royal pain in the ass.

DBA's want you to believe that triggers are the right approach to this type of problem just like they want you to believe that business logic belongs in the database. Why? Because they want to own it which gives them worth. Not because it is the right thing to do. A trigger is a "golden hammer" to a DBA because they have no other way to run arbitrary code at set intervals. But not all problems are "nails".
dood mcdoogle
Friday, June 20, 2008
 
 
Thanks for the explanation. I'm developing some triggers to do two things:

1) To enforce a complicated contraint on the data -- triggers rolls back the transaction if they doesn't like it

2) To implement a historical log -- for example when record are updated or deleted triggers archive the old record

> Data loss is often not acceptable just because one business process (updating a balance) fails.

That makes sense. But that's a business decision (not a technical decision about whether an implementation can use triggers). In that business case, an alternative might be to keep the triggers, but if one them fails then insert the data into a different (less contrained) table from which it can be re-processed later.

> DBA's want you to believe that triggers are the right approach to this type of problem just like they want you to believe that business logic belongs in the database. Why? Because they want to own it which gives them worth.

I don't know about to *this* type of problem, but a general reason IMO for triggers in the database is to help enforce/guarantee the relational integrity of data in a given database.
Christopher Wells Send private email
Friday, June 20, 2008
 
 
One other advantage to a batch oriented system is that the entries will be your transaction history.

So you would have an easier time answering questions like "What did you do to my account last Friday?"
xampl
Friday, June 20, 2008
 
 
> One other advantage to a batch oriented system is that the entries will be your transaction history.

Would you please give me a little overview or hyperlink to what a "batch oriented system" and "transaction history" are, and/or how they're implemented? Those phrases are difficult to Google.
Christopher Wells Send private email
Friday, June 20, 2008
 
 
"But that's a business decision (not a technical decision about whether an implementation can use triggers). In that business case, an alternative might be to keep the triggers, but if one them fails then insert the data into a different (less contrained) table from which it can be re-processed later."

Good point. I guess that most of my reasons do boil down to business decisions. There is probably not technical reason why you shouldn't use triggers. After all, batch processes can fail silently too. From my experience though we've always ended up needed a batch process to clean up issues so there was no reason to support two processes that did the same thing. And the business decisiions always seemed to dictate that a batch process be in place so we ultimately abandoned triggers.
dood mcdoogle
Friday, June 20, 2008
 
 
In Oracle I've never seen mysterious trigger failures and related corruption. That seems like the sort of comment too specific to particular implementations to be useful as general advic because everyone has had problems with particular features in particular technologies. However triggers have fundamental problems aside from whatever specific bad experiences people have had. They make code harder to debug, and have read consistency problems -- those transcend their expediency.
David Aldridge Send private email
Friday, June 20, 2008
 
 
> Triggers have read consistency problems

You mentioned that in http://discuss.joelonsoftware.com/default.asp?design.4.569365.5#discussTopic569967 ... because triggers are run as part of the transaction, I don't see that triggers have any more or less of a consistency problem than would implementing the trigger functionality as extra, explicit SQL instead of as an implicit trigger.
Christopher Wells Send private email
Saturday, June 21, 2008
 
 
I see: you might want to allow concurrent (unlocked) INSERTs, but then lock (in a separate, subsequent, batch-oriented transaction) when you calculate the totals, because that involves a read-add-write operation.
Christopher Wells Send private email
Saturday, June 21, 2008
 
 
Chris, a "batch oriented system" is one that processes a whole lump at a time. An example you might be familiar with is how banks tend to process deposits and withdrawals at one time during the day. This leads to them saying things like "deposits made after 2pm will be credited to the next day's business." 

As another example, at a previous employer, our workgroup's job was to move data around and keep all the systems synched up: CRM, network and billing. Generally, each process was of the sort "give me all the changes to X that happened in the last 24 hours" and those would be processed in a batch and the changes would be applied to the destination system(s) as a batch. 

Or, as another example, your company mails out invoices on the first of the month. You'd most likely process the previous month's transactions/stuff on the last day of the month - all in one batch.
Peter Send private email
Saturday, June 21, 2008
 
 
What is "transaction history", and why or how would a batch oriented system give you transaction history entries (where a non-batch oriented system wouldn't)?
Christopher Wells Send private email
Saturday, June 21, 2008
 
 
Trigger consistency problem:

Here's an example from Tom Kyte: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:53473305799476

You have to serialize by locking the aggregate record, so that Session B cannot read the value of the aggregate until Session A has read, updated, and commited. (see the example chain of events above). Otherwise the update that Session A makes is lost because Session B reads it before being able to see the update that Session A has made.

I think that in Oracle you could guard against this by setting the read consistency to serializable but that would just ensure that the session fails with an error if the situation occurs. http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_sqlproc.htm#sthref274
David Aldridge Send private email
Sunday, June 22, 2008
 
 
If efficiency becomes business critical, use a time series analysis oriented database technology, such as kdb+

This will allow you to run your query in the order of milliseconds.

Monday, June 23, 2008
 
 
I'd say the biggest advantage to a batch oriented system is that it really lines up with how the accountants think about these sorts of things.  You can have all the computer science models in the world, but accountants will think of it as a table of numbers that need to be added.
Lance Hampton Send private email
Monday, June 23, 2008
 
 
You could also compartmentalize the data, for example have a seperate table with a running total for an interval (say a month) and then the batch process only needs to update the months where data changed or no total exists at all for the month. If the data being entered has a field for date created and date modified the batch process could use that to determine which intervals need to be recalculated.

Most applications of this type also have a reconcile type function that forces it to recalculate everything for a given  interval. Using this system all the reconcile function would do is wipe the summary rows for the specified interval and criteria and invoke the batch process to perform the recalculation.

Having developed a payroll system in the past I will agree with dood that triggers are not the best choice for this sort of thing.
Gerald
Monday, June 23, 2008
 
 
>> batch oriented system give you transaction history entries (where a non-batch oriented system wouldn't)?  <<

A transaction history is the list of changes applied to an entity.

For example: you get a credit card statement each month -- it contains a starting balance, any payments, any new charges, and your new outstanding balance.  Each line on the statement is both a change (mutation) to your current balance, and a piece of it's transaction history.

The change might be a payment, or a charge, or a correction to a previous error.  In a batch system, the transaction *is* the history.

In an online system, you obviously have the same inputs as in a batch system, but if you want to keep a transaction history, the developer has to do extra work to write to a history table as part of their DB ACID transaction.


What people have been saying so far is that you can accomplish the writing to a history table (or updating a current balance) by using your database's trigger functionality.  Others have been saying that their personal experience has been that you cannot trust triggers, because they will occasionally fail silently.

There is no 100% correct answer -- my take is that both application and system software has faults which can result in silent dataloss.  As part of the system design process, the architect & BA need to decide what an acceptable failure rate is.  If it's greater than 0.0000% (and trying to specify zero failures is about as fruitful as specifying 100% uptime), you should design the application and business process to allow a human to make corrections.

"We're sorry for the error, Mr. Xampl.  We'll go ahead and credit your account."
xampl
Monday, June 23, 2008
 
 
All the accounting systems that I have worked on for the past decade have included both a history table and a "your current balance is" table that reflected the total of all the records in the history table.

I have never seen the two disagree, and I would no more expect them to than I would expect to write "FOO" into a CHAR field and get back "BAR".

I have normally seen this accomplished via standalone INSERT/UPDATE statements, not triggers, because for various reasons triggers are often not flexible enough. Of course different systems might have different experiences.
Greg Send private email
Tuesday, June 24, 2008
 
 
In Oracle you could have a materialized view that either did a periodic refresh (say every hour), an on demand refresh (when you told it to) or after a commit.  It updates the aggragate table for you.
Jim Send private email
Wednesday, June 25, 2008
 
 
Do you have a field for Open-Balances for each of the transactions?  If you have to calculate the open-balance for each transaction, I can see that it's quite slow. If you can keep an open-balance field, or amount-paid for that transaction in the transaction table, where when a transaction is paid (partially or fully), the open-balance field is updated, then the calculation of the balance by over-due type should be fairly easy. Each of the Overdue amount would be just a query by transaction date range.
Jerry Wang Send private email
Wednesday, June 25, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz