The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Images in DB?

Exist any disadvantages of keeping images in db and not in file system?

Wednesday, April 19, 2006
 
 
Performance and database size tend to suffer storing images in the database as, say, a BLOB.

You'll probably find storing the images on a file server and storing the path and file name in the database will keep the database size smaller and wont hinder performance.
Marcus from Melbourne
Thursday, April 20, 2006
 
 
FWIW, Sourcegear Vault obviously keeps the files in the db as blobs, and Joel moved all image and file attachments into the db from the filesystem for the current version of FogBugz.
--Josh Send private email
Thursday, April 20, 2006
 
 
> FWIW, Sourcegear Vault obviously keeps the files
> in the db as blobs, and Joel moved all image and
> file attachments into the db from the filesystem
> for the current version of FogBugz.

That's interesting. Any idea why?

I faced the same decision a while ago and, in fact, chose to keep my files on disk and pointers to them in the database.
MonkeySpank Send private email
Thursday, April 20, 2006
 
 
It also depends on how many images you have

If you had 100,000,000 X 200K images, do you really want a 20 Gigabyte database? Would it be reasonable to assume the database will just handle this volume of images, well?

On the other hand if you had 1,000 X 50K images, then 50 Megabytes is a whole different kettle of fish.
S. Tanna
Thursday, April 20, 2006
 
 
>If you had 100,000,000 X 200K images, do you really want a >20 Gigabyte database? Would it be reasonable to assume the >database will just handle this volume of images, well?

Any decent database, ie SQLServer, Oracle, DB2 etc could handle this pretty easily. 20GB is a pretty small database in the scheme of things.
Craig
Thursday, April 20, 2006
 
 
Sorry I mistyped, I meant 20 Terabytes.

200,000 bytes X 100,000,000 images
= 20,000,000,000,000 bytes
= 20,000,000,000 Kilobytes
= 20,000,000 Megabytes
= 20,000 Gigabytes
= 20 Terabytes
S. Tanna
Thursday, April 20, 2006
 
 
So why did the Fogbugz team decide to change the product to store files in a database instead of on a filesystem?

By his own admission Joel doesn't make product changes on a whim. So I'm guessing there must have been a wound that required healing.
MonkeySpank Send private email
Thursday, April 20, 2006
 
 
Seems to me, keeping them outside the database would introduce a link that could easily be broken.
Harley Pebley Send private email
Thursday, April 20, 2006
 
 
That's right... not only did it mean one less folder you had to worry about on the filesystem, but it made backups much easier for people.  Performance didn't suffer at all.  Any modern database can handle large blobs perfectly well, so storing them on disk just creates another headache when it comes time to move to a different server, restore, or backup your app.
Michael H. Pryor Send private email
Thursday, April 20, 2006
 
 
From what I've read, it's generally not the storage issue you need to be concerned about, it's the retrieval issue.  If you're going to be getting many hits per second trying to pull 10MB files out of a database then the file system route may be the way to go as performance will be much better.

If your traffic won't be that intense, then I believe DB storage is the way to go for the reasons given above.
CK Send private email
Thursday, April 20, 2006
 
 
CK is on target.  While we would all like a "always do this" answer, it is going to depend on your situation.

It is no longer a storage issue, with few exceptions most databases will handle the files fine, it is a transfer issue.  A 5 meg image file is not unheard of and is getting more common everyday.  When people buy a 5.1 mega-pixel camera they set it to the top. 

You can db cache them and that will eliminate some of the select time, but the get it to the desktop time is going to follow the db stream and that can be ugly.  If you keep them on the file system (or if the web on the web server), you have the option to serve them from a completely different device.

Again - YMMV greatly.  While in theory you can do the math, it is usually easier to just do it both ways and try it.
MSHack Send private email
Thursday, April 20, 2006
 
 
I have asked this question too and haven't got a great answer.

One anwer is that when the database has to page rows into select on, the blobs take more memory so you have to page in more to select the rows. This slows down performance.
son of parnas
Thursday, April 20, 2006
 
 
Wouldn't doing a backup of a) a large db full of images be a lot more of a problem than a backup of b) a small database and set of files ?
captain damage
Thursday, April 20, 2006
 
 
==>Seems to me, keeping them outside the database would introduce a link that could easily be broken.

Ayup. I'm not aware of any product in my world that allows for referential integrity between a database table and a directory of files.

I've got one project where this burns me regularly. Wish I'd just stuck 'em in the database and I wouldn't have to deal with it.
Sgt.Sausage
Thursday, April 20, 2006
 
 
one thing to think of would be putting the table(s) with the images in a seperate filegroup, on a seperate spinning disk (drive). That could help abit.
D in PHX Send private email
Thursday, April 20, 2006
 
 
for webapps i tend to store images in the filesystem, then the webserver can do what it is good at (serving files).

in a client/server enviroment, i tend to stick them in the database so that i don't have to have a shared drive/folder to access them from, thats where it starts to get messy, different permissions to see this or that, some people in a different workgroup can't see the share file, etc.
bumperbox
Thursday, April 20, 2006
 
 
I presume one plus for putting files into the DB is that they can then participate in transactions and rollbacks.

Another reason I can think of is where your application is running in a clustered envronment with failover, so you avoid having the files on one single machine (point of failure) and can piggyback on the db replication.
Java GUI Programmer
Thursday, April 20, 2006
 
 
Another point nobody has mentioned

Is if you want to preserve the images, let's say they'rere litigation evidence, or regulatory records for a nuke plant, you'll want to put them on a WORM drive.

I don't know how many, if any, databases, can handle blobs of images in records being on WORM.
S. Tanna
Thursday, April 20, 2006
 
 
"Wouldn't doing a backup of a) a large db full of images be a lot more of a problem than a backup of b) a small database and set of files ?"

No.

Two points of failure rather than one.
Andy.. Send private email
Friday, April 21, 2006
 
 
One other point in favor of filesystem image storage is that the images get cached on the client by url, and so only get served once per client.  I haven't done the DB storage route myself, but I assume it would involve deserializing the blob directly on the server, right?  So you'd have to do that every single time the image was asked for and/or perform caching on the app or db server that could be better offloaded to a dumb image server.  Yes, there are disadvantages too as mentioned above -- depends on your needs of performance vs. maintainability.
Brian Moeskau Send private email
Saturday, April 22, 2006
 
 
Writing files to disk are not transactionally safe unless you implement that functionality.  You cannot archive a live file system unless it support that feature (snapshots, BCVs etc), so may mean having to implement that feature in software.

The database that I use most, PostgreSQL 7.4, does not have a real elegant BLOB interface (IMHO), and usually there is space overhead associated.  While disk space may not be a big deal usually (for small datbases), the extra I/O may drive the need for extra disks.


/Allan
Allan Wind Send private email
Saturday, April 22, 2006
 
 
"If you had 100,000,000 X 200K images, do you really want a 20 Terabyte database? Would it be reasonable to assume the database will just handle this volume of images, well?"

If you have that many images you are likely to be in as much trouble using a standard file system.

"One other point in favour of file system image storage is that the images get cached on the client by url"

Images extracted from a database and served by a script will also be cached on the client if you get the http headers correct.
Err
Tuesday, April 25, 2006
 
 
I'm developing a webapp that I expect to have to scale up to storing millions of images.  Many of these "images" will actually be several variants of the same image at different sizes and cropping. 

I've done the whole "store the filename and put the image data on disk" for about half a million images.  It sucks. This new app will put all of them in the DB.  Here is why:

1) One less point "thing" to worry about.  You can use all the tools avalible in a database including referential integrety and triggers.  Nuke the parent image?  A cascading delete will take care of any children or the system could just raise an error.  Try that with a file based solution.

2) The database route avoids all the worries about unique filename & path generation.  No file system on the planet will hold millions of images in a single directory, so if you went that route you have to come up with a scheme to partition your images into smaller directories.  That sucks.  If you shove it in the database, you dont get this problem anymore because a database is designed to handle millions of rows.

3) "The web browser will not cache images out of the database" is simply not true if you do your homework.  Whatever you are doing to dish an image out to the browser can easily add the right set of headers to instruct a browser to cache the image.

4) "The filesystem has all kinds of caching the database doesnt".  Also not true.  For millions of images getting dished out at random, no cache on the planet will keep all of them in memory.  Even still, there is nothing stopping you from caching the image data on the webservers.

5) "The database will not scale".  I'm not a DBA, and neither are you.  At the point where the database "will not scale" is also the point where you can afford to bring in a consultant who can make it scale.

6) "Backups might suck".  Backing up potentially terabytes of data will suck no matter what route you go.  Sure if you did the filesystem route you can use rsync but there is nothing stopping you from writing a custom backup script that only backs up new / changed image data.

7) "I might want some external script to hit each file and if it was in the database, I couldn't do that".  You are telling me you are going to let some script touch your files and NOT touch the database too?  No matter what you are doing to those images, odds are good you will need to hit the database.  If the script can't handle blobs, then write the blob to the file system run the script on the file, and move it back in.

In summary, most of the objections to putting images in the database are non issues if you really think about it.  My recommendation?  Shove them in the DB.
Cory R. King
Wednesday, April 26, 2006
 
 
I ran into this design decision myself.  I was in charge of writing an application for a logistics company that transported automobiles to dealers after they came off production line.

There was a paper trail with VIN number and some other numbers that went along with each vehical on the tracter-trailer truck.

At the end of the trip, all the cars were unloaded and a data entry person would take the stack of papers (one sheet for each car usually) and scan them using a high speed scanner.

We used some 3rd party software called Canofile.  Canofile would create TIF files with a file name fitting a pattern we specified.

My software would read the csv file coming from Canofile software and copy the TIF images to a NAS device.

The reason we kept the images out of the database was for a number of reasons:

1.  TIF images did not need to be backed up the same way.  Putting TIF images in the database made backing up the database longer!

2.  TIF images were stored on NAS cluster and the cluster had it's own backup that was faster than backing up a database with BLOBs.

3.  Risk of link breakage was mitigated.  A .conf file is read by the software to figure out the network share where the images are kept.  User's don't have direct access to the share where the images are created.  Looking at the images is controlled in the software that stores the images and reads the CSV file.  You can query for a VIN number or tracking number and a tumb view list of all the scanned documents (TIFs) is presented that later can be zoomed, printed, etc... Used LeadTools for the image work -- awesome library!

4.  Copying TIF images to a BLOB field was considerably slower than copying to an SMB share.  You have various database overhead.  We were scanning in 1,000s of images in a single run of our software and uploading this many images to BLOB fields was taking too long when we tried this approach initially.

5.  If an administrator wants to quickly look at the images, s/he can quickly go and view the file share outside the software storing the images.  This was not possible if we stored the images in a database.  Sometimes, an admin will go and grab all scanned images and print them or store them to CD-R for the insurance company.  Stroing images in RDBMS required the admin to go into our app, query for the images, send them to a local share for burning.  It's much easier to just go to the share.

6.  We can find via navigating the file system where an image is located since all image pathing is stored in path that corresponds to the information.  Example, folder names are created with the current date as the folder name.  So if I want all the images scanned in today, I can go to 20060427 and copy that folder up some place.

There were things like this that drove us to keep the images outside the database.
~Eric
Thursday, April 27, 2006
 
 
Any modern RDBMS is much more reliable than the filesystems we have right now.
AqD Send private email
Monday, May 01, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz