A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
I'm planning on making an auction site that caters to a very specific subgroup of people. I'm fairly confident I can handle most of the technical details, but I want to defer to the experience of this group on one question.
First, the table layout: I'm planning on having several tables to store information about users, items being auctioned, bids, etc. These will all be in 3NF, so data integrity is ensured. Then I'm planning on denormalizing/duplicating several fields into the "auction" table, which has one row per auction. This table will contain every field needed to print a page on that auction (user name, highest bid, item description, etc.) as well as the required foreign keys used to reference specific information, in a sort of "data warehouse" style. This cuts down the number of database queries to 1/page.
I expect this table to be read often and written semi-often. My initial theory is to add new auctions to a temporary table, then evey fifteen minutes (or half hour or whatever) drop the index on the main table, move the new auctions over, recreate the index, and keep going. This way, there's not a noticable hit every time someone adds a new item.
Am I being paranoid? I'm hoping for 100,000 to 300,000 users (not simultaneous, thank God), so a large amount of auctions. Maybe I won't need to drop the index when I move the temporary entries over, just add them every 15 and let the index auto-adjust? Or will this scheme actually be a detriment to my app's response time?
Tuesday, February 08, 2005
This sounds a bit like premature optimization to me. I'd probably start by just inserting the auctions directly into the main table and if that proves too slow then look at some alternative scheme. Otherwise, prototype it to get a feel for what the insert performance will be like and then make an appropriate design decision at that point.
I also wouldn't try sticking all the information for a single auction in a single table unless it makes sense to do so from a data structure point of view. You can optimize access to auctions by implementing appropriate caching in the application server so I wouldn't sweat over optimizing the database layer for performance at the expense of maintainability. I expect caching at the app server level would provide significantly more performance then the database optimization your planning.
First thought: This scheme sounds like "planning for failure" ... an acknowledgment that the technology you are using is not up to the job you are asking of it.
Now having got that out of my system, once you have developed your 3NF structure you might think about exactly what changes are going to occur to what tables, and how often. The denormalization into a single table is not your only option -- you can also consider denormalizing the (for example) 5 tables that are the most static in terms of the web page display, and joining the resultant table to the more dynamic one or two tables (for example the one that stores bid history).
You might even establish a single denormalized table that represents the most static tables, and then have the process that adds new bids to your bid history also directly update this denormalized table. This sort of arrangement is usually frowned on because it serializes a process (in this case adding bids), but in fact here this is _exactly_ what you want -- you have to have a mechanism for avoiding the acceptance of two identical bids at the same time anyway. So your "AddBid" process would go something like ....
i) Lock the individual auction
ii) Check new bid higher than latest
iii) Add bid to history
iv) Update individual auction to new "latest bid price, date/time, person"
v) release lock
Then if you fail to get that auction lock, you wait until it is released, check to see if the newer bid is higher than the latest, take appropriate action.
Keep it simple for now - just insert the new auctions in the database (with or without staging, depeding on your workflow). Profile your application and see where the real bottlenecks are (I would use JMeter for that). Then optimize accordingly. Retest each change to check if your changes give you the expected performance boost or not.
Drop any changes that do not improve performance - they only introduce unnecessary complexity.
Tuesday, February 08, 2005
As if you needed another reply -- you certainly should not denormalize at this time. Remember that you don't need 10 queries to get the information you discussed. If your DBMS is a good one (e.g. not MySQL v.3 or Access) you probably have enough SQL power to gather all that data in a single query. Provided that the tables are indexed this should well scale into millions of auctions.
If your auctions are anything like Ebay, you're going to have a lot of transactions per user -- they're going to be hammering the auction pages with a bunch of reloads to see if the price has changed either on the item they're selling, or the one they're interested in buying. If you're planning on caching auction information in the db, that's still going to cost a db hit.
I'd be more inclined to doing the page caching in the application layer, and just pulling some dynamic bits (such as current bid, # of bids, etc.) out of the db.
Alternatively, look at materialised views if you're using Oracle.
Tuesday, February 08, 2005
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz