The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

O/R mapping - worth it?

I was talking with a programmer friend about the design I was trying out on the .NET app I'm working on, and he informed me that object-relational mapping is what I was doing.  So I started reading up on it.  Now I'm starting to wonder if what I'm doing is so wise.

Granted, it's nice to be able to use objects and collections instead of fooling around with DataRows, especially with many-to-many tables.

But the ORM tools out there seem overly complicated for stuff that's supposed to make my life easier.  And when I read about all the stuff you have to include if you roll your own, I have serious doubts about whether O/R mapping is really such a panacea; it seems like you end up duplicating everything a database does and then some.

So I would like to solicit some opinions on this.  Have you used O/R mapping in successful projects?  Was it easier than, say, typed DataSets?  Did you use a commercial or open source O/R mapper?  If so, how about the learning curve?  How long did it take you to become productive with it?
Kyralessa Send private email
Thursday, January 20, 2005
 
 
I find OR-mapping indespensible in practically any database-centric software, such as web apps. Not only it speeds up development, it makes the whole design more flexible because of being a useful, logical abstraction layer.

I program mainly in Perl, and Class::DBI is very easy to learn and use. I guess a similar thing in .NET would be more difficult to work with because of more static nature of the environment. But I think it still worth using, even just for the sake of architectural flexibility.

However, with OR mappers, like with all frameworks, it's hard to provide a good set features without assuming too much about application it's to be used in. There're many crappy attempts that create more problems than they solve. Choose carefully.
Egor
Thursday, January 20, 2005
 
 
If you want to understand O/R, you gotta check out Hibernate, the absolute state-of-the-art, best of the best, and its open source.  Its Java, but there's a .NET version out (not sure how much of the functionality has been ported).  The learning curve's a bit steep, but there are some good third-party round-trip tools that can autogenerate a lot of the config files and classes for you.

But back to your question, is it worth it?  YES!  Once you learn the tools (and the O/R thinking/programming process), you'll wonder how you ever lived without it.  It abstracts away tons of data access plumbing and allows you to prototype things much quicker, which lets you experiment with alternative designs really easily.

Warning, once you use it, strongly-typed datasets will seem clunky and lame.

Also the code is well laid-out and commented, so tweaking it to your needs is very easy.

Can you tell I like O/R (and Hibernate)?
dave Send private email
Thursday, January 20, 2005
 
 
Kyralessa,

In most cases you're better off without an ORM tool, the main reason being you're loosing some of the best features of the relation world (bulk transactions and random access to individual records).

The object world is in most cases too fine grained to yield good (if any) results. OOA&D promotes in-memory object graphs where objects can only be accessed only through an object path. Random access to any object in the graph is not an option and still a challenge to be solved.

One example to make my point clearer: think of the directory structure. Directories can only be accessed through the root directory and not randomly, but only via a clearly specified path.

The relational model was designed with indexed sequential access method in mind: an expensive random access to the entry point, then fast, cheap, access to the next records matching the given access criteria. The relational world is ISAM driven with cursors as the main access vehicle. There's no in-memory object graph or anything similar (caching though is trivial to implement for static sets of records).

ORM tools try to reconcile the two approaches: navigable in-memory object graphs vs. ISAM cursors. But there are very few problems where there's a need for this approach. For example in a global reservation system the daily inventories have to be in-memory for extremely fast access (they get hit at the rate > 10^3 transactions/sec). For this type of problem I would look at ORM tools with caching capabilities; however for the one instance where I had this problem I chose to use prevalent objects as the better tool (google on Prevayler).

For the most cases an ORM tool is an overkill and employing it is an expensive mistake. The assumed gains in development productivity are dubious since:
- one has to maintain 3 components: the code, the schema and the mapping; things get so out of hand that we need yet again another tool to keep the 3 in sync (some will generate code, some will change your code, some will change the way your code works)
- the code is somewhat twisted around the relational concepts
- the data model is somewhat twisted to help the object world. As typical example, all tables have an artificial object id as a key, regardless the table represent a weak entity or not; the resulting models are, in most cases, not even 1NF with objects of multiple types in the same table and undeclarable/uninforcable constraints.

You're better off with well thought, well structured code and good rock-solid BCNF (from where you can denormalize for performance where needed) data-relational model.
Dino
Friday, January 21, 2005
 
 
O/R mapping sounds good in theory but gets twisted into incorrigible knots in practice.

Think in terms of "Objects Use Data" instead of "Objects Are Data" and there will be less of a need to do O/R mapping.
T. Norman
Saturday, January 22, 2005
 
 
Dino, you make some interesting comments, but I disagree with a few points.

One, there's no rule that says if you use O/R mapping, you have to use it for ALL your data access.  If you need bulk inserts/deletes, etc., there's absolutely no reason why that functionality can't hit the database directly.  Hibernate best practices recommend straight JDBC for these sorts of operations.

Two, and maybe this is me not understanding something, but I really can't fathom what you are talking about when you say you lose random access with O/R mapping.  This is true in the case of crappy .NET datasets, where you have to call a dumb Fill() method before you can work with the objects.  With Hibernate (sorry to harp on this but its the only one I know well), any criteria that can be modeled in SQL can be modeled in Java code (with less code and better readability), and the performance is excellent.  You never, ever have to pull back unnecessary records if you don't want them.

Three, the issue of maintaining multiple layers is really a red herring.  The tools do this for you and are easy to use.  Code generation is the future in many different aspects of software dev, you need to make your peace with this.

Monday, January 24, 2005
 
 
"what you are talking about when you say you lose random access with O/R mapping"?

Objects structure themselves in graphs. In an object graph you cannot access an object randomly, but only through graph navigation. Access to an object in a graph is fundamentally sequential.

In the relational model, which is fundamentally set theory, you select sets which can be iterated as a list. Set selection is fundamentally random.

For example in a PetStore kind of application:
Object world:

List relatedProducts = new LinkedList();
for(Iterator iter = getUserCart(userId).getItems().iterator(); iter.hasNext(); ) {
    CatalogItem ci = (CatalogItem) iter.next();
    if(ci.type() <> CatalogItem.ACCESSORY) {
      relatedProducts.add(ci.getAccessories(ci.id());
    }
}


Relational world:

select r.* from catalog p, accessories a, catalog r where p.id in 
    (select catalog.id from cart, catalog where
    catalog.type <> 'ACCESSORY' and
    cart.user = '$1' and
    cart.item_id = catlog.id) and
p.id = a.parent_id and r.id = a.related_id

List relatedProducts = db.query(see_above_statement);

IMO, use ORM tools if the relational is not appropriate (with hierarchical, network, etc models; there you need to go the graph navigation way) otherwise the relational model is simpler and in most cases faster. If db portability, clustering, etc are an issue, again look at ORM tools or some tool which addresse your particular issue.

I tried ORM tools before (SECANT, ObjectBridge, I looked at Hibernate in its very early stages, I wrote my own specialized ORMs) and I found that in most cases my code works better and faster if I stop beating around the ORM bush and get the data I need the fastsest simplest way.
Dino
Monday, January 24, 2005
 
 
Valid points, although current ORM tools don't require in-memory filtering as your snippet shows (Hibernate in particular can fetch related objects several levels deep in a single trip to the database).

My point is that very, very few projects have performance requirements high enough so that it is cost-effective to spend programmer-hours on hand-tuned SQL.  Disk is cheap, RAM is cheap, CPU is cheap, programmer-hours are not cheap.  For the vast majority of projects, ORMs will provide acceptable performance and save huge amounts of development time.  Plus, most application programmers are not particularly good at SQL programming, so in the real world ORM-generated queries are usually equal or superior to hand-coded ones.
dave Send private email
Monday, January 24, 2005
 
 
"My point is that very, very few projects have performance requirements high enough so that it is cost-effective to spend programmer-hours on hand-tuned SQL."

Not true at all.  When dealing with large tables (millions of rows), or multi-table joins, a properly tuned query can make the difference between running in 1 second or running in 20 minutes.  It's not that the performance requirements are high per se, it's that the data set is huge.

However, that doesn't mean every SQL statement must be pre-emptively tuned to death.  It is usually more effective to highlight queries on particular tables for tuning, as well as doing performance testing to identify which queries are actually slow before bothering to tune them.
T. Norman
Tuesday, January 25, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz