The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Scalable database servers

Hello,

I'm currently in the process of developing an application which, in the future, needs to scale up, really well. As in, I will probably need to handle around 4,000 database queries a second within the next 3 years.

Now, I need a database server that scales up well and supports full SQL; I have experience quite a lot of experience with postgresql, but am in doubt at the moment: I'm not a database expert and probably never will be. I cannot hire a DBA at the moment, and I was always told: never use Oracle unless you know what you're doing.

So basically, now I'm in doubt whether to continue using my postgresql experience with the risk that we will need to do some massive database transition in the future (which really is something I would hate to do), or trying to learn a bit of Oracle myself with the risk of creating a buggy database design (which really is something one should hate me for).

What would you do in this case, and why ?

Thanks for your input.

Regards,

Leon Mergen
Leon Mergen Send private email
Monday, March 05, 2007
 
 
4,000 queries a second is normally a trivial amount, such that I think you will be fine using PostgreSQL. I'm also assuming that you're using adequate hardware - a 150mHz Pentium II computer with 64 MB of RAM will struggle of course.

The biggest bottleneck is how you've structured your database - if every select statement has to join together multiple, non-indexed tables, that too will slow your system down.

But given that you're experienced, you know what you're doing, and you've bought good hardware, you should be able to handle hundreds of thousands of queries per second. I would also not stress over it - three years is plenty of time to plan how you'd upgrade the system.
TheDavid
Monday, March 05, 2007
 
 
Yes, most of the major ones should support the queries pretty well as long as you have it designed well.

I was recently working with a mysql database that I designed and found that it was only getting about 60 transactions/sec... aftera  bit of refactoring, I was able to completely eliminate one query and get rid of a join and now regularly run at 3-4k/s with peaks around 7k.
KC Send private email
Monday, March 05, 2007
 
 
Having worked on some fairly big websites, I would say a lot of time lazy programming on the web programmer's part attribute to wasteful web rendering and db accesses. Caching is the solution: once you start hot-spotting the usage and determines what is cachable, start caching rendered controls as well as raw data recordsets to solutions such as Danga's memcached or memory file system. ASP.NET 2.0 also support such caching solutions out of the box, look into them.
Li-fan Chen Send private email
Monday, March 05, 2007
 
 
The best scalability solution is Microsoft SQL Server - it scales all the way from Windows Mobile to clustered, high availability, high volume transaction processing environments.

You can develop, test and demonstrate free of charge using SQL Server Express. Your clients can licence Standard or Enterprise edition depending on their deployment and infrastruture standards.

You get fantastic support with tools like Reporting Services, Analysis Services, and deep integration with the Office platform.

The only shortcoming of SQL Server when compared to Oracle or DB2 is that it only runs on the Windows platform. That has never been a problem for me.
rond
Monday, March 05, 2007
 
 
To be fair, MS SQL Server has only really been vertically scaleable - the TPC benchmarks are done on machines with umpteen processors and a TB of RAM or something equally ridiculous. SQL Server has always had major problems scaling horizontally, because active-active clustering was nigh-on impossible to do. SQL Server 2005 makes a better fist of this with Database Mirroring.
.NET Guy
Tuesday, March 06, 2007
 
 
I do Oracle performance tuning, but I am not a religious guy. Scaling databases is more about your database design and how you use the database than the product itself. Postgre has multi-versioning right?

If you are doing an OLTP, is your data model rigidly normalized? Contrary to popular belief this improves performance because it simplifies queries.

you need to test your queries to make sure they are efficient and using the "correct" index in the "correct" way. I am sure postgre has a way to generate sql plans.

are you using bind variables? If not, your system won't scale no matter what you do.

Make sure you run the minimum number of queries possible. Do NOT run a query 100 times from a loop. I see that periodically and people are stunned that it affects performance.

Are your data files spread out using SAME (its an oracle corp concept, but applies every where, google it). This just means spreading your data files out to distribute IO across your hard drives.

3,000-4,000 high OLTP queries is fairly high. I know verisign has a home grown database that does 100,000/second and it is 100% in memory.

the key to scaling this is working with someone who knows how to scale databases. The problem is that most people who say they do are frauds.
Contractor
Tuesday, March 06, 2007
 
 
I am not going to be stunned at the silly comments that TheDavid makes. 3-4k queries/second is very high. That is 1.2 billion/day.

you will need quite a bit of hardware for this.

the person who told you not to use oracle unless you know it is correct.
Contractor
Tuesday, March 06, 2007
 
 
3-4k might not be much, or it could be a lot.

TheDavid is right, it is all about the queries.  The more specific and focused you get, the futher out you scale.  If your database cannot join 10 tables without barfing and you have to write a crapload of application code instead (MySQL), your application will not scale and you'll be one of those guys with 2tb of memcached to make up for it.  I'll toss in that it is just as much about the schema too.  Lots of people think they are smarter then the database and build in lots of premature optimization into their schema.  You are not smarter then your database.  Do it right and you'll be blessed with data that can be massaged into whatever view you need.

Pick a vendor, PostgreSQL is as good as they get, and use every. single. performance. feature.  Every one.  As I said, PostgreSQL is perfect for this - it's got all the tools you need to make a great performing database.  The userbase around postgres is as smart as they come, check the mailing lists for help specing hardware and tuning queries.

You will not migrate databases, but if you do, the fact that you tried to make everything "generic SQL" will not affect the cost as much as you think.  You might as well make the most of what you have now, cause if/when you move it will be a pain in the ass no matter what. Why you'd move from PostgreSQL is beyond me though.
PostgreSQL User
Tuesday, March 06, 2007
 
 
> You are not smarter then your database.

There is a great truth in this.  SQL Server is astonishingly powerful if you understand what it can do and let it.
Robert Rossney Send private email
Tuesday, March 06, 2007
 
 
"I am not going to be stunned at the silly comments that TheDavid makes. 3-4k queries/second is very high."

I agree.  I've never worked with a high performance system like that, but it seems awful high to me too.  Assuming you've got a quad processcor machine that's an average of 1000 queries/sec/processor, or 1ms/query.  I don't care if you've got a fast processor and disk, 1ms per query is hard to get on any db and computer I've used.  Seems like realistic query execution times would be more like between 5 and 50 ms for fairly trivial queries.  Caching might improve that, but in any case 4000 queries per second sounds definitely non-trivial to me.
Herbert Sitz Send private email
Tuesday, March 06, 2007
 
 
Herbert,

One of the things that will make a huge difference is the distribution of the types of queries.  Updates are time consuming.  Selects can be time consuming.  Inserts are quick and easy.  Deletes can be.  In many of these cases, it's not the processor that is the bottleneck at all, it can often be the IO of the drives..

For the application I describe above, it's about 50/20/20/10 between inserts/deletes/updates/selects.
KC Send private email
Tuesday, March 06, 2007
 
 
What tools do people out there use to test the scalability of their database? I'm especially interested in open source products, not something like LoadRunner.

Thanks.
justwondering
Tuesday, March 06, 2007
 
 
Does anyone have experience with a column oriented database like MonetDB or C-Store?
Gregory
Tuesday, March 06, 2007
 
 
Leon, I'm curious what application you're developing.

Why do you believe you'll need to support 3000 queries per second within three years?

Sounds like an overly optimistic growth estimate to me.
BenjiSmith Send private email
Friday, March 09, 2007
 
 
"The only shortcoming of SQL Server when compared to Oracle or DB2 is that it only runs on the Windows platform"

It has a poor locking model that escalates easily and has excessive overhead. It also has poor read consistency implementation.

In Oracle, readers do not block writers and writers do not block readers. If your transactions include writes then Oracle is a superior choice for high concurrency applications.

Also Oracle is the most heavily instrumented database application out there. If you need to understand exactly where the three seconds of a particular query is being spent then it is trivial to dump wait events (and Oracle 10g has over 800 documented wait events) and know exactly what has to be done to reduce the execution time. Think of it as Dtrace for the database.
David Aldridge Send private email
Sunday, March 11, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz