The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Storing Data W/O Using a Relational Database

As a young developer, SQL & Relational Databases has been hammered into me like it's the only way we should store data.  If I want to write a win app, find a way to put a SQL database in it.  If I want to write a web app, of course, use SQL.  If I want to do anything which involves data, use SQL!

What are some other options I can use to store data or is SQL really the only hammer I should consider when storing data?
YoungDev
Wednesday, December 03, 2008
 
 
XML files are one option, I suppose.  But, as an older developer, I prefer relational databases.
old gas bag
Wednesday, December 03, 2008
 
 
Like most things in programming it really depends on what you are doing.

If you have an application where users are inputting, editing, and reporting on data then yes, a relational database is probably your best bet. Of course you have a wide selection of database to choose from things like Access and SQLite for low end embedded database up to Oracle for very high end server databases.

If you application treats "data" in more of a document oriented fashion then you may be able to get away with just storing data in a file in a custom format or (please don't crucify me) XML.
Dan B Send private email
Wednesday, December 03, 2008
 
 
One obvious alternative would be one of the various cloud options.
D. Lambert Send private email
Wednesday, December 03, 2008
 
 
You really need to give people enough information so they can answer your question. For instance, how much data do you need to store? What type of data is it (application configuration options or an online order database)? How will the data be used (lookup table type data or order entry information)?

Think about your question before you write it, and then read what you've written before you post. As yourself if you would be able to answer someone else's question using *only* the information provided in the text and nothing else. Doing so may help you to write better questions, and improve your chances of getting a useful answer.
Ken White Send private email
Wednesday, December 03, 2008
 
 
Need more information.  Will you ever need to handle concurrency issues?  Do you need ACID properties?  What are you trying to accomplish?  On the other hand realize that with a SQL database you do get a lot of things that you don't have to do yourself.  If your expertise is X do you want to dilute your efforts by not using a tool that will do the job. (depends on the job you need done.)
jim Send private email
Wednesday, December 03, 2008
 
 
You can store data in files. Paul Graham has mentioned before that Viaweb (now Yahoo! Stores) originally didn't have a database and just used files: http://www.paulgraham.com/vwfaq.html
John Topley Send private email
Wednesday, December 03, 2008
 
 
CSV files are great.
sloop Send private email
Wednesday, December 03, 2008
 
 
I like image steganography myself.
Sambo
Wednesday, December 03, 2008
 
 
You might find looking at some non-relational database projects:
  Amazon SimpleDB: http://incubator.apache.org/couchdb/
  Apache CouchDB: http://incubator.apache.org/couchdb/

There are other non-relational ways to store data, but these are the examples I could think of straight away.
Gareth Marshall Send private email
Thursday, December 04, 2008
 
 
"SQL & Relational Databases has been hammered into me like it's the only way we should store data."

That is almost as silly a statement as advocating that XML files are the only way to store data.
Arethuza
Thursday, December 04, 2008
 
 
As a young developer, you're telling me no one has educated you, and/or you've failed to educate yourself, about "flatfiles," "sequential file access," "random access files," "serialization," "binary" ... etc. etc. etc.? (not to mention XML, as as noted already?)
Andrew Badera Send private email
Thursday, December 04, 2008
 
 
And really, flat file ought to be broken into stuff like delimited (CSVs are an example) and fixed width ... for starters.
Andrew Badera Send private email
Thursday, December 04, 2008
 
 
In Poker Copilot, I store the data transiently in a collection of interrelated hash maps. Works great for my problem domain. No relational db.

I'm in the planning stages of a new project which will use Amazon's SimpleDB. Sounds great for read-mostly web apps, and my initial experimentations with it have been good.

But for data-heavy apps that need a high level of data integrity and structure, you can't beat the good RDBMS. In my day job as a consultant, I'm currently working with an enormous centralised collection of data a RDBMS. I couldn't imagine doing it any other way.
Steve McLeod Send private email
Thursday, December 04, 2008
 
 
A well designed file store beats a badly designed database, every time.  If you know what you're doing, you'll know how to build both of them right.  If you really know what you're doing, you'll know when to pick each one.

SQL databases are primarily useful when the usage of the data is evolving in unpredictable ways, and when data is to be shared among applications that have little to do with each other. 

There are database systems that are even more adaptable than SQL to changing usage of the data.  However, designing databases for those systems is still largely a matter of trial and error.  SQL and RDB have been around long enough so that correct or at least satisfactory design is well understood for a variety of cases,  and you can learn how it's done from those who have gone before.
Walter Mitty Send private email
Thursday, December 04, 2008
 
 
"Walter Mitty" makes a good point: We know how to build RDMSes and how to work with them.  Be careful when striking out on your own. 

//recently worked on a project where the original developer rolled their own data storage mechanisms.

147K lines of custom code before we moved to an RDBMs, and 4K lines after.
Brian Begy Send private email
Thursday, December 04, 2008
 
 
uh, RDBMs.  Sorry, giant fingers and small keyboard.
Brian Begy Send private email
Thursday, December 04, 2008
 
 
SDS (SQL Data services) is Microsoft offering to store data on the could and access it via SOAP or REST API. You can store data and meta data around that data (entity) in a container which is part of an Authority (think it being a database). SDS is highly scalable, on-demand data storage and query processing utility services. SDS  provides highly available, secure, standards based web services that are easy to program and provision. You can find more information here – http://msdn.microsoft.com/en-us/sqlserver/dataservices/default.aspx
There are SDK and tools for you to play and examples on how you can use and build for it here - http://services.community.microsoft.com/feeds/feed/msdn_ssds_videos
Ashish Jaiman Send private email
Thursday, December 04, 2008
 
 
How about Berkeley DB? A nice simple associative array in a file with huge record sizes and a one-to-many option.

You can put anything you want in the key and anything you want in the record. You could almost build a relational database engine on top of it if you have too much time on your hands.
Rowland
Thursday, December 04, 2008
 
 
I think your smoking something if you think soap or rest as a db protocol is scalable.  They are very heavy protocols.  The wrapper for the payload is huge.
jim Send private email
Thursday, December 04, 2008
 
 
Jim,

I think db access with REST is scalable, in that the time per request can be the same, whether you are making 1 request per day or 1 million requests per day.

Is it performant? That raises doubts.

Time per request will be slow - 200 milliseconds in my simple test with Amazon's SimpleDB. 200 ms per request with 1 request per day or 10000's per day.
Steve McLeod Send private email
Friday, December 05, 2008
 
 
There are also types of data that don't map well to a relational structure - documents being a good example.

Sure you can store a single document as a BLOB in a database but I've never seen a system that represents a document as a relational structure (i.e. tables for Documents, Sections, Paragraphs, Sentences, Words etc.). I think that would be rather silly!
Arethuza
Friday, December 05, 2008
 
 
You can get much higher performance from a non-relational database than a relational database if you know what you are doing. C-Tree from Faircom is a great product. Relational databases are far easier to use, however.

It is a real pain to build C-Tree database tables. It is a real pain to add new fields, and re-size fields. And worse pain simply getting the data out of the database. You have to write loops and add table keys, and loop some more.  A query that you can do in one one line of SQL can take hundreds of lines of code in C.

I don't think being good at non-relational databases will do much for your enjoyability.

If you need the performance, then consider a non-relational database. Otherwise, SQL is hard to beat.
XYZZY
Friday, December 05, 2008
 
 
Steven,
Let me rephrase that , efficiently scalable.  Also 200 milliseconds to get simple results back is poor.  Look at the payload size, the header and format stuff takes up a major portion of the room in the total payload.
jim Send private email
Friday, December 05, 2008
 
 
I'm with Jim on the whole REST DB layer != scalable angle.

For a web application, you've got 100ms to get everything together and out the door.  Even assuming 50ms round trips to an external REST datastore, you'll need to do some serious fancy footwork to get the response put together and sent off in a reasonable timeframe.

Now imagine you're doing something marginally non-trivial, like storing a record then pulling back 3 small datasets (2 of which are dependent on the first one).  That's 30ms of work with SQL server on the box next door, but you're never going to get it under 200ms if you have to roundtrip to Amazon's servers at every step.

So yeah, it scales in the sense that you can do lots of it and not see any additional overhead.  But no, it doesn't scale because you can't use it to do anything but the most trivial operations in real time.

(Language pedants, feel free to substitute the word "scale" above with "perform", "efficiently transfer", or whatever term makes you happy.  The point to take away is that REST DB services don't work for web apps.)
Jason Kester Send private email
Friday, December 05, 2008
 
 
It's funny that everyone seems to be equating "scale" with "speed." Go play with Google App Engine, and tell me whether or not you think it "scales." Then tell me whether or not you think it's "quick." Or how about Microsoft's SSDS/SDS?

Scale != speed

Scaling out != scaling up

Data over HTTP is perfectly scalable, just not super speedy.
Andrew Badera Send private email
Friday, December 05, 2008
 
 
Andrew, For what it's worth, I disclaimed my last post in the hope that you wouldn't need to write your rebuttal.  Since you missed that part, I'll state it again using your own language:

if (!Scales || !Fast || !ScalesUp || !ScalesOut)
{
  CantUseIt();
}

So yes, you can use whatever terminology you like to explain why SimpleDB takes 200ms regardless of how many requests it sees.  The bottom line is that it's not acceptable for real time applications such as web apps.
Jason Kester Send private email
Friday, December 05, 2008
 
 
Jason,

I wasn't addressing you specifically, but your ego would seem to ass_ume otherwise.

YOU can use whatever terminology you would like. Me, I prefer to stick with terms used by industry and professionals to communicate and do business.
Andrew Badera Send private email
Friday, December 05, 2008
 
 
Andrew, sorry if my response sounded like an attack.  I wasn't trying to pick a fight.  Just trying to clarify my point and make sure everybody is on the same page.  Sorry if it came across otherwise.
Jason Kester Send private email
Saturday, December 06, 2008
 
 
All good.

Sure, 200ms for an operation might suck.

200ms for hundreds of thousands of operations, some serial, some stacked in parallel, still sucks for speed, but that's scaling out.

GAE scales out. SSDS/SDS scales out. Horizontal schemas scale out in general.

Scaling up is typically much more expensive than scaling out. Making that operation a 20ms call might be fairly cheap. Making all hundreds of thousands of those calls into 20ms operations will not be.
Andrew Badera Send private email
Monday, December 08, 2008
 
 
What sort of data are you trying to save?
If it breaks into tables nicely, then SQL/CSV work well.
(If I'm exporting data for someone else to consume, I've taken to using CSV these days.)
If it's singleton information (like application configuration), I use XML serialization, which is pretty easy in .net.
Mike Swaim Send private email
Monday, December 08, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz