The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

data model for stock prices, part 2

Which is better: one big table for the prices of all securities, or separate tables for each one.

Option #2 strikes me as superior for performance of queries.  There is also an advantage when it comes to managing the database at a lower level of granularity, since you can quickly drop and re-import all the data for a particular stock.

Is there any disadvantage to separate tables?  The only obvious one to me is the proliferation of them, since there are probably thousands of listed securities I want to track.

Thanks.
NPR Send private email
Sunday, June 18, 2006
 
 
Not all databases are good at handling thousands of tables; in MySQL's MyISAM tables, it very much depends on the filesystem below as every table is eventually represented as a file in a flat directory (or, at least, it did in MySQL 3)

#2 won't let you have any "overall" queries, such as: "which stocks moved more then 2%/day in the last week" -- you'd have to have a different query for each stock. Nor will it let you compute an index (e.g. the djia) reasonably.

#1 is more common for these reasons and others.
Ori Berger
Sunday, June 18, 2006
 
 
>Is there any disadvantage to separate tables?

Yes, every time you add a new stock, you have to create a new table. This is going to be a rather unmanaged mess. Further, what happens if over time you add a new field? (Again, this is difficult, as you now have a large number of tables with the same structure – this INSTANTLY hints at a normalized database).

Further, how can you write a query to give information like the number of different stocks traded in a given time period without writing a huge mess of code?

Really, separate tables would be a absolute LAST resort. Further, using different tables for everything means you would effectively throw out the use of sql for most quires on data. You would also effectively throw out any management tools that come with the database manager when you need to make schema changes. (field changes, index changes, performance tuning etc).

Note that NONE of your sql tools can operate globally on a large number of tables.

Routines to export data would difficult, but routines to import data would have to create tables on the fly, and a host of other difficult issues.

You will wind up coding yourself into the ground.
 
Using a separate table for each thing as opposed to adding one simple column that defines the stock is a grand canyon in terms of difficulty and managing of your system.


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
Albert D. Kallal Send private email
Monday, June 19, 2006
 
 
> this INSTANTLY hints at a normalized database

should read

 this INSTANTLY hints at a UN normalized database
Albert D. Kallal Send private email
Monday, June 19, 2006
 
 
This kind of design keeps coming over and over as a revolutionary idea...
Today, there's really nothing to be invented about database structures. The normal forms and the star schemas tell everything there's to know about.
Sevenoaks Send private email
Monday, June 19, 2006
 
 
Some databases (MySQL 5.1? others, probably) allow you to 'partition' the underlying structure based on the primary key. In a sense, this is the inverse of a materialized view of multiple tables.

In a clustered index with the 'stock' as the leading part of a multipart primary key, you get similar effects for free (each stock will be 'close' to itself on the file system).

Also, rememeber - mulitple tables don't decrease the actual number of bytes on disk. So even though you *think* the indexes will be more efficient, you'll blow your caches when you switch stocks. In effect, you're just making performance worse. The only time this would make sense would be if you could actually assign more physical resources (for instance, if each of these tables was on it's own database node). Totally overkill in your case of course.
mysql hack Send private email
Monday, June 19, 2006
 
 
>> mulitple tables don't decrease the actual number of bytes on disk.

That's a nice way of putting it. I'll have to remember that one next time someone suggests n identical tables because they're scared of an 'enormous' single table.
Larry Lard Send private email
Monday, June 19, 2006
 
 
Sevenoaks,

What about Cubes?
Lenny
Monday, June 19, 2006
 
 
Thanks, all great comments.  I overlooked some of these things since I haven't done a lot of db administration.  The fact that it's possible to partition the table based on primary key is a huge bonus of using a single "prices" relation.  I forgot about the potential maintenance nightmare of applying common changes (alter table, index, etc) to thousands of tables.

In my last job, there was this kind of thing (separate tables for separate data sets) for a different type of data.  It was a less clear-cut case, since each data set shared some common attributes, but also added its own.  It was a bad example from which to learn effective database design.
NPR Send private email
Monday, June 19, 2006
 
 
Definetely one table... if your DBMS and standards supports arrays in the schema, I would also consider an array per month (or maybe per year) to store prices, rather than one record for each day. That's what we do in our app.
NetFreak Send private email
Monday, June 19, 2006
 
 
If your queries against the data mostly require multiple records per stock, rather than multiple records per day for example, then give serious consideration to some form of physical clustering of the values based on stock id. Partitioning is not necessarily the only or the best route (especially you still have to add new partitions per stock). In Oracle I'd be looking at a hash cluster for example.
David Aldridge Send private email
Tuesday, June 20, 2006
 
 
my firm uses one table for stocks_for_one_year. so the tables are stocks_1999, stocks_2000, etc. sometimes this is brokern down into months as well depending on data frequency
Tom Vu
Monday, June 26, 2006
 
 
Also, keep in mind that each trading day produces (at least) five different data items (open, high, low, close, and volume) for each security.
BenjiSmith Send private email
Thursday, June 29, 2006
 
 
As every one else has already said, single table is the way to go.  In your message title you mention "stock prices."  If you have instruments other than stocks, then it is better to have one table for each class of instruments.

If you just need to load up a few (thousand) end of day historical prices from yahoo then you don't need much more infrastructure, if you are doing this for a company which needs to deal with price quotes, then you might be surprised how tough it can get.

Stocks can change their symbols, names of companies can change, there can be mergers or spin-offs.  If your data is not end-of-day, then you have to make sure "stock price" actually means last-sale-price and not qote-data, two different things with different attributes and slightly more complex structure (and a lot more data).  It can be a royal pain :)
falcon Send private email
Tuesday, July 04, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz