The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Objects versus their IDs


I'm finding that in a project I'm working on right now I use an object (e.g., a customer) and its id (i.e., the customer Id) fairly interchangeably. There's no real rhyme or reason to when I use which one, other than whichever was more convenient for the use case I was working on. I feel like this is a hack and is letting the data layer (where the IDs are generated) leak into my object model.

So, my question to the group is when do you expose and use an object's ID in your domain/object model and when do you stick with the object itself. I realize this might be a very situation-specific answer, but are there any common trends? Are there any techniques you use to smooth over the disconnect between an object and its ID?

Eric Marthinsen Send private email
Tuesday, April 29, 2008
Generally I prefer to use the ID because it can get very hairy very quickly. It only seems ugly at very simple use-cases. There's no reason to not consider the unique id as part of the domain model, versus being just part of the persistance model.

First, how expensive is the customer object, meaning how big is it? Does it hold their entire profile, preferences, credit cards, etc? If you need to look up the related information not directly held by the customer object, such as audit history, how would you do this if a unique ID wasn't available? An external ID will need to be exposed to customers (whether a different ID, email address, etc) - why is this more valid as part of the domain model versus a numeric ID?

What about when you rewrite the customer? Do you now need to update every code flow which passed the object, or just where the customer object was used by looking it up via a factory?

Generally I construct interfaces to accept/return the ID and assume reasonable response times via a cache. Internal to a module, the customer object may be used directly in short-running processes. When the customer module is rewritten (and it will be, many times!) then I haven't leaked the implementation details. Modules can be migrated gradually without spawning a unmanagable web, versus a system have three or more past generations of customer objects actively in use.

So the question to you is, why would using the object form be better than using its ID?
Benjamin Manes Send private email
Tuesday, April 29, 2008
"So the question to you is, why would using the object form be better than using its ID?"

Because objects have intelligent fields, methods and (in one word) semantics? Because they have state, invariant properties, ...?
Daniel_DL Send private email
Tuesday, April 29, 2008
Using the ID lets you have many light-weight 'references' to a class Instance, without having to copy the entire Instance around.

Some 'class models' in languages blur the distinction between an object Instance and the Object itself -- I think in Java and in C# EVERY Object is really a 'Reference'.  While in C++ there's a clear distinction between a class Instance and a Pointer To a class Instance.

Oh, and "light-weight" above means a long or a long-long (4 bytes or 8 bytes) ID giving you access to an Instance of a class, through using an "Index" of some sort to 'look-up' the Instance.

Having said that, most OO languages use pointers or references for the same purpose, without the 'added layer' of a 'long' or 'long-long' ID and ID lookup and ID management.  For me, having an explicit ID makes debugging and design simpler and more straight-forward.  It's so much easier making sure my record is holding a 'link' of some kind to Customer 100 if I can SEE the "100" ID, instead of having some kind of "address" pointer I have to de-reference to make SURE it's Customer 100.

But that attribute of OO languages means you'll probably get some argument from OO purists that using an ID is completely un-necessary and only a newbie would want to do it.  But as I say, it DOES have a few (a very few) advantages.
Tuesday, April 29, 2008
> It's so much easier making sure my record is
> holding a 'link' of some kind to Customer 100
> if I can SEE the "100" ID, instead of having
> some kind of "address" pointer I have to de-
> reference to make SURE it's Customer 100.

That's what ToString() (in .NET) and __repr__ (in Python) are for.

Generally speaking, any time you create and use an ID for an object, you're creating overhead:  the overhead of ensuring their uniqueness, the overhead of maintaining referential integrity, the overhead of lookups. 

There should be a good reason for doing this.  Often there are many.  But you should know what they are.
Robert Rossney Send private email
Tuesday, April 29, 2008
Thanks for everyone's input. As a bit of clarification, I'm not advocating getting rid of IDs. They are essential. Regarding when to use an ID versus the entire object, I'm starting to lean towards using the object within the confines of an aggregate and using IDs for any operations that span an aggregate. For instance, a method within an Order object would accept an Item object as a parameter. Conversely, a method that retrieved a shipment record would use the order's ID. There are some exceptions, notably that any sort of repository would take and return full objects. I'm not sold on this scheme, but it seems to make some sense. Would the group agree?

Eric Marthinsen Send private email
Tuesday, April 29, 2008
Maybe you need a CustomerIdentity class (ID, Name) that can work as a proxy for the full Customer object.
Wednesday, April 30, 2008
I've got the same problem, but I'm hesitant to use objects when the method doesn't actually require an instance of the object.  Why have the extra overhead (which could involve heading to the database to instantiate the object) when it isn't needed.

It does lead to an inconsistent API; which, like you, I don't necessarily like.  If I was using a more powerful language I'd use overloaded methods to provide ID and object parameter methods for everything and have it call one or the other depending on which makes more sense.
Almost H. Anonymous Send private email
Wednesday, April 30, 2008
A decent domain model will ensure you only have a single instance of a particular record, and that a simple lookup on primary key is cached, so its never expensive to get the object from the ID.

So all of my stuff tends to work with the real objects. I use d&c diamond binding for my data layer - and that takes care of it all for me. But I can expect stuff like this:

Order foo = Order.Find(2949);
Order bar = foo.Items[0].Order;

foo will be the same as bar. Also later if I do a .Find(2949) then it will return pretty much instantly.

So theres no real reason to make your API crap and pass around IDs all the time.
Mike Send private email
Thursday, May 01, 2008
"a simple lookup on primary key is cached, so its never expensive to get the object from the ID."

Sure, lookups on primary key are cached but the initial access has a cost.  If I wasn't so concerned about performance (where every database query counts) I probably wouldn't bother passing around IDs.
Almost H. Anonymous Send private email
Friday, May 02, 2008
I've faced this sort of tradeoff before.

Word of warning: dealing exclusively with IDs can lead to tricky bugs. For example, if two objects in your domain model both have integer IDs, and you have a service that takes in both IDs and does something with them, then you are totally dependent on the order in which you pass the IDs.

You face a form of weak typing if you deal only with IDs. On the other hand, if you instantiate the objects and pass them around, the compiler will bark at you if you screw up the ordering.

That brings me to the next point: if the ID of the object is part of the model, then I include it in the object itself. Customer ID is an attribute in my model. I don't view it as a leak between layers.

As far as performance goes, most of the time you can get away with the Lazy Initialization pattern if your Customer object is involved in lots of associations. For example, if you lazy-load a customer, you load only attributes that are primitive types (strings, integers, doubles, etc.). If, sometime later, someone accesses the Orders collection of that customer, the Customer object checks if it is null. If it is, it invokes the data access layer somehow (direct call, raise an event, throw an exception, whatever) and the Orders collection is then loaded from the DB and brought into memory only when it is needed.

I tend to provide both lazy loading and non-lazy or "strict" loading when I'm in a situation where I know I'll use the associations of an object. Once my final exams are over, I'll write a blog post about this in more detail. Anyway hope this answers the question.
MoffDub Send private email
Saturday, May 03, 2008
I prefer objects. Both are just references so there is little to compare in overhead between the two. And with modern ide's inspecting the object is easy.

So why would you want to work with an id rather then an object?
Wednesday, May 07, 2008
This has already been explained.

In not all OO languages are objects always "references".
Thursday, May 08, 2008
When you write to someone and want to give them your telephone number would you parcel up your telephone (the object) and send it along with the letter, or would you sipmly write the number (id reference) in the letter?
Tuesday, May 20, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz