The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Binaries in Database

Conventional wisdom says that if you're uploading a file (through a web browser) to the server, you save the filename in the database and the file itself somewhere within the filesystem. A casual search seems to suggest that this is for performance reasons.

I have a scenario where I'm building a web application from the ground up to work - with the bare minimum of changes - on development, testing and production servers. We would also like to be able to give out copies of the application. Towards that end, it assumes the existance of a database and reads its configuration and customization info from that database.

Since each potential server essentially has its own set of data, I'm concerned that having both a database and a "filesystem" to deal with simply introduces more variables and more chances for something to go wrong. If it's just the database, the user only has to define the connection. If it's both, the user also has to configure and troubleshoot the filesystem access.

As a frame of reference, I'm expecting about 500 hits per day, and about 100MB to 250MB worth of binary files. It is unlikely - given this market - that anyone will download the application source code, install it, and need to deal with an order of magnitude more users and space requirements.

What's the general feeling on binaries in or out of the database?
TheDavid
Wednesday, September 13, 2006
 
 
It's much simpler if you can put everything in the database.

It *may* be more flexible (performance, control over where things are stored, etc.)  if you put some of reference (possibly file name) to the binary in the database. 

If I was you, unless I had a good reason not to, I'd put the binaries in the database to start.  I'd also design the database structure so I could put the binaries elsewhere in future should I so choose.  And I'd also make sure the code that stored/retrieved the binaries was all in one place.  That way, I could easily retrofit support for external binary files with minimal effort.

The more binaries you have, and the larger they are, the more likely you are want to eventually have them outside the database.  So if you know you will only ever have few binaries, then this level of flexibility may be unnecessary.
S. Tanna
Wednesday, September 13, 2006
 
 
"What's the general feeling on binaries in or out of the database?"

Binaries in the database is the way to go.  I also recommend isolating all binaries into their own table; don't mix binary columns into regular tables -- do a join to a table that contains nothing but the binary data and a key.

I have a web application that stores some binaries (images) in the database and some binaries (files) in the file system.  I can tell you from experience that dealing with the files in the file system is a pain in the ass.  You've got more area to backup and more area to move around when deploying.  There's always a chance that the file system can get out of sync with the database.

For performance reasons, you probably shouldn't read every binary from the database whenever it's needed.  Instead, cache the binary in the file system and fetch it from there if it exists.  This way you have all the advantages of the database solution and the performance benefit of the file system.
Almost H. Anonymous Send private email
Wednesday, September 13, 2006
 
 
When putting binaries in a database, you may want to concider that tha database size would grow faster. Bigger transaction logs, space for database backup. I guess you would backup all anyway, but then you would have many files to backup, with the binares in the DB, you'll have one big file.
g Send private email
Wednesday, September 13, 2006
 
 
With your expected volumes, no problem with binaries in the database.
Mike S Send private email
Wednesday, September 13, 2006
 
 
If someone is downloading binaries from your site over the internet I'd really doubt that the database would be the bandwidth bottleneck.
David Aldridge Send private email
Wednesday, September 13, 2006
 
 
I think you answered your own post:

"Conventional wisdom says that if you're uploading a file (through a web browser) to the server, you save the filename in the database and the file itself somewhere within the filesystem. A casual search seems to suggest that this is for performance reasons."
D in PHX Send private email
Thursday, September 14, 2006
 
 
> I think you answered your own post:


The problem is conventional wisdom is often wrong and there is a lot of disagreement on this.

For example, the point about you don't want binaries in the table because it slows down queries doesn't make sense because these should be lazy loaded.

The idea that under high loads the database won't be able to serve images as fast as the filesystem/OS/websever makes sense too.

It's easy to say, but benchmarking is probably your friend here.

I've read suggestions for loadbalancing image requests amongst read only replicants. You get all the benefits of database storage with efficiency.
son of parnas
Thursday, September 14, 2006
 
 
>> The problem is conventional wisdom is often wrong and there is a lot of disagreement on this.

And you also have to understand the reasons why it is the convention, and whether your own circumstances differ from the conventional. If you have different priorities than the other 66% of people weho make up the convention then their solution is not going to work for you.
David Aldridge Send private email
Thursday, September 14, 2006
 
 
> And you also have to understand the reasons why it is the convention,

Originally databases didn't handle BLOBs at all. Then they added support and the support was poor. Is it still so poor? Technology often changes faster than perception.
son of parnas
Thursday, September 14, 2006
 
 
By the same token, my "volumes" are not big enough to warrant benchmarking or premature optimization. At this time, the primary drivers for my decision are ease of coding and ease of installation/configuration.

My original decision was to go ahead and upload the binaries into the database.

But I'd also like to reasonably future-proof the application. It's kind of like when a young "journeyman" programmer suddenly realizes that 3NF is not always the correct solution. :)

But maybe Son of Parnas is right and on this issue, technology advances now and in the future, pretty much eliminate conventional wisdom's gotchas?
TheDavid
Thursday, September 14, 2006
 
 
Sorry for the half@ssed post earlier.

In a db, _everything_ is situational. YMMV. In fact, your milage will _always_ vary.

Whats interesting about the question is that it is really a database question wrapped in an architecture question.

I'm a SQL Server guy so keep that in mind if you are using something other than that.

If you end up storing the files in the db, the files are no longer seperate, individual files. They are part of the database file itself.

So the growth of the database files will be related to the files being uploaded.

When you go to backup the database, you will be backing up those files as well.

Now, in SQL Server you can actually put a table into a different physical database file on the disk.

So you could put the table with the binaries in a different file and not have it interfere with the normal db backups.
D in PHX Send private email
Friday, September 15, 2006
 
 
I like that option.

Presumably if a situation arises in the future where some other administrator needs to optimize the application (as a system), it's probably easier for a DBA to run a few tests and optimize the database or put it on a cluster or whatever, than for the DBA, SysAdmin and Developer to tweak disparate pieces.
TheDavid
Friday, September 15, 2006
 
 
At a previous job, we had +9 million files stored in the filesystem under a 2-tiered folder scheme (customer name, then 100 folders that were hashed into).

This was woefully inadequate, as we were running into limitations of the NTFS filesytem, as well as indexing all those documents (ours were human-readable, not binary) was a big pain, as it turned out that MS Indexing Service would reindex the entire directory when a file was touched...we were in constant indexing mode. :-(

Since you don't have the indexing problem that we had, creating a multi-level folder system that gets hashed into might work.  Suggestion: Unless you're using a SAN, give it the ability to hash across multiple storage servers.  Eventually you'll run into a I/O bottleneck on one machine (network, or disk) and you'll need to scale out, so best to plan for it today.

I recommend against storing your binaries in the database, *unless* the data can be partitioned per user or per customer database.  If they all go into one humongo database, you'll eventually run into storage space problems for your DB files.
Example Send private email
Saturday, September 16, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz