The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Bidirectional many-to-many self-join

I have a table which contains objects:

Table Objects
ID Otherstuff
1  foo
2  bar
3  etc

I want to create (store) arbitrary links between these objects, so:

Table Links
ID1  ID2
1    2    //defines a link between the 'foo' object and the 'bar' object

If all links are bidirectional, should I store each link as one record or as two records in the Links table?

For example ...

Table Links
ID1  ID2
1    2    //a bidirectional link between 'foo' and 'bar' object

... Or ...

Table Links
ID1  ID2
1    2    //a link from 'foo' to 'bar'
2    1    //and a link from 'bar' to 'foo'

The problem I see with using one record is that it's harder to find all links to the 'foo' object: because you need to select from both columns in the links table, like ...

select id2 as link from links where id1 = 1
union
select id1 as link from links where id2 = 1

Conversely, the problem I see with using two records is that I don't see how to define a conatraint which will enforce that for each record in the Links table there's a second corresponding mirror-image record.
Christopher Wells Send private email
Friday, July 07, 2006
 
 
Why enforce a universal bi-directional relationship?  Will there ever be any situations where FOO relates to BAR but BAR doesn't relate back to FOO?

Are there ever any circumstances where:

FOO relates to BAR
BAR relates to YADA
YADA relates to FOO
FOO relates to YADA

but not necessarily fully symetrical?
Karl Perry Send private email
Friday, July 07, 2006
 
 
What is it? What real information are you storing? Relations don't exist in a theoretical vacuum. Unless you tell us what the objects are and how they relate to each other nobody can help you.
Database purist
Saturday, July 08, 2006
 
 
Your single-record link doesn't look so tricky to deal with.
And "union all" should be slighly faster assuming there are no self-referencing links.
Tim Williams Send private email
Saturday, July 08, 2006
 
 
For the 2 record case you could use a:

check constraint
where not exists (select a.ID1, a.ID2 from sameTable a
where not exists (select b.ID1, b.ID2 from sameTable b
where a.ID1 = b.ID2 and a.ID2 = b.ID1)
)
You'll have to look up the exact syntax and test it.
Saturday, July 08, 2006
 
 
Isn't that just checking for duplicates?

He wants to insert. But as the two reocrds are 'mirror images' and one can't be inserted without the other then you need some sort of batch insert from a trigger and a deferrable constraint or something.

Anyway, it's all nonsense because he's using nonsense data as an example.

Seems to me that most relationships between real things aren't totally 'bidirectional' in the sense that they are equal in each direction. Neighbours, siblings maybe. But then why does it need to be defined twice? If the relationship really is completely 'bidirectional' then once is enough.

These sort of idiotic questions happen a lot here. People post questions like 'I have 2 tables with such and such values...' then give meaningless values. As though values in data tables are like an example of looping through an array of integers, and it doesn't matter what the integers are.

The thing you're storing in those tables matters. So unless you're prepared to reveal what top secret information it is you ain't going to get anywhere. Of course then you might have to think a little bit like a database designer, god forbid!
Database fascist.
Saturday, July 08, 2006
 
 
The reason you get alot of questions like this alot (and nothing wrong with them) is because the majority of folks on these forums are FE developers and not database guys.

On the other hand, don't generalize the stuff you're storing.
D in PHX Send private email
Saturday, July 08, 2006
 
 
No, database fascist, it is not checking for duplicates.

It is making sure that no "single" records exist.

For each target record, it verifies that it is a "single"
record by making sure there are no "mirror" records.

It verifies a "mirror" record by equating the *opposite*
column, which is why it compares a.ID1 against b.ID2, not
b.ID1 (and vice versa).
You'll have to etc etc
Saturday, July 08, 2006
 
 
"Seems to me that most relationships between real things aren't totally 'bidirectional' in the sense that they are equal in each direction. Neighbours, siblings maybe."

I have this exact sort of problem in an e-commerce application.  You can select related product.  The products are related in fully a bi-directional way.  I choose to make the linking explicit (two records) but I had other reasons for doing that.  I'm normally not a fan of having redundant data in the database and having two records is redudant.
Almost H. Anonymous Send private email
Saturday, July 08, 2006
 
 
"The products are related in fully a bi-directional way."

I'm not disagreeing, but give me some examples, and why they are related in a fully bi-directional way.
Database fascist.
Saturday, July 08, 2006
 
 
I don't know, maybe because *that's the way the customer wanted it*.  I mean really, is it so hard to believe that *something* might be fully bi-directional?
Almost H. Anonymous Send private email
Saturday, July 08, 2006
 
 
Thank you all for your replies.


@Karl:
>  Will there ever be any situations where FOO relates to BAR but BAR doesn't relate back to FOO?

Yes, that's a good question. Yes, some (not all) relations might be directional/asymmetric. But even this could be modeled either way:

* If a bidirectional link were modeled as 2 records, then a directional link can obviously be modeled by only storing one record.

* Conversely if a bidirectional link were modeled using only 1 record, then a directional link could be modeled by adding an extra attribute column to the links table (to describe the directionality of each link).


@Tim:
> single-record link doesn't look so tricky to deal with.

Yes. Thanks


@You'll have to look up:
> For the 2 record case you could use a check constraint ...

Thanks for the example.


@Database fascist:
> one can't be inserted without the other

Yes, I think I might have to insert each of the 2 records one by one into a temporary table that doesn't have the constraint, and then insert them both simultaneously into the constrained table using 'insert from'.

> If the relationship really is completely 'bidirectional' then once is enough.

Yes. And the Links table has half as many records as in the other case, but takes twice as many selects to return the identical results. Maybe there's nothing more to it than that. I was wondering whether for any reason one of the two solutions was more common/idiomatic.

> ... Of course then you might have to think a little bit like a database designer

I see what you mean; thanks for the insight!


@D in PHX:
> On the other hand, don't generalize the stuff you're storing.

If I were modeling storing and had 10 different types of link (e.g. "uses", "tests", "implements", "inherits", etc.), would it be better to store this as 10 different tables and not as a single "Links" table?


@Almost H. Anonymous:
> I choose to make the linking explicit (two records)

Do you do something to ensure that the records are always paired?

> I'm normally not a fan of having redundant data in the database and having two records is redudant.

That was how I felt about it.

I suppose another way to model a bidirectional relationship would be ...

Table Links
LinkId  ObjectId
1    1    //link#1 links the 'foo' object
1    2    //link#1 links the 'bar' object

... i.e. link #1 is a link between 'foo' and 'bar'.
Christopher Wells Send private email
Saturday, July 08, 2006
 
 
"Do you do something to ensure that the records are always paired?"

No, but I was also dealing with a fairly limited platform as well.  The only way to modify the records was through an API and it kept things straight.
Almost H. Anonymous Send private email
Saturday, July 08, 2006
 
 
> "If I were modeling storing and had 10 different types of link"

Sorry, that should read: "If I were modeling and storing UML, for example, and had 10 different types of link ..."
Christopher Wells Send private email
Saturday, July 08, 2006
 
 
Follow the money and you find your answer.

Hack:

Store A->B, have that imply B->A in code/constraints


Proper:

Store A->B, B->A


I would probably do the hack without even thinking about how the embedded cleverness might confuse people I work with, or my future self.

I'm sure there are plenty of methodology fascists who would say implement it the proper way and they wouldn't have the least trouble sleeping over the wasted storage.

What's more valuable to your project?  Clever or proper?
Michael B
Saturday, July 08, 2006
 
 
"I don't know, maybe because *that's the way the customer wanted it*.  I mean really, is it so hard to believe that *something* might be fully bi-directional? "

Not atall. What did you have in mind.

An amazing discussion this. You're talking about storing data in a database. Not one of you has yet said what information it is. What things are being talked about.

Pretty typical for programmers I suppose.

If you can't be arsed to think about the data and what it means, then just use a text file.
Database fascist.
Sunday, July 09, 2006
 
 
"I'm sure there are plenty of methodology fascists who would say implement it the proper way and they wouldn't have the least trouble sleeping over the wasted storage."

And what the hell is that meant to mean? Does anybody here actually have a clue about databases? The whole point about all the tedious ER stuff, normalisation etc is that it eliminates wasted storage, doh!

This 'links' table is probably wasted space. Of course we won't ever know because the OP won't tell us what is being stored. It must be a Defence Department project.
Database fascist.
Sunday, July 09, 2006
 
 
Database Fascist,

He is asking about a technique, not for domain
knowledge. So we are trying to answer that.

By the way, the point of ER and Normal Forms
is not to eliminate wasted storage, it is to
eliminate the probability of error which
redundancy brings, and to make update queries
faster.

I see no redundancy in the case where you can
have single directional links as well.

In some databases, such as datawarehouses used
for OLAP, redundancy is deliberately introduced
(as standard and correct technique) as a trade
off for increased read query performance.

This looks like an analogous case case in that
inserting the data twice makes it possible to
use considerably more straightforward to queries
(which may also make the query optimizer's job
easier).
You'll have to etc etc
Sunday, July 09, 2006
 
 
The structure of a (relational) database is decided by the meaning of the data.

If you don't understand that you have no business whatsever getting involved in database design.

Still, I daresay you can all get round your crappy database designs with lots of lovely application programming.
Database fascist.
Sunday, July 09, 2006
 
 
> This 'links' table is probably wasted space. Of course we won't ever know because the OP won't tell us what is being stored.

I'm storing documents (or, rather, I will be: this is a hobby project, that I haven't begun to code yet).

I've researched "storing hierarchical data in databases", to understand how to model the documents' section/sub-section relationships (that's not the point of this question).

This question is about to how store other links between the sections of different documents, for example to let a user say "if you're interested in this section of this document, then 'see also' (you may also be interested in) that other section (and vice versa)."

Of the 3 options:

1) ...

  Table Links
  ID1  ID2
  1    2    //defines a link between the 'foo' object and the 'bar' object

2) ...

  Table Links
  ID1  ID2
  1    2    //a link from 'foo' to 'bar'
  2    1    //and a link from 'bar' to 'foo'

3) ...

  Table Links
  LinkId  ObjectId
  1    1    //link#1 links the 'foo' object
  1    2    //link#1 links the 'bar' object

each of these has advantages and disadvantages.

2) Is fast at selecting all birectional links to a sepecific object, and lets you decorate each leg (e.g. to say "50% of people who like A also like B, and 30% of people who like B also like A"), *but* there's no convenient way to decorate the link itself with extra attributes (but to resolve this, adding a 3rd column named "LinkID" would allow a separate "LinkAttributes" table).

3) Lets the link be decorated with extra attributes, is fast at selecting all birectional links to a sepecific object, generalizes to the case where more than two objects are being linked, *but* there's no convenient way to specify a unidirectional link.

1) Lets the link be decorated with extra attributes, including (if necessary) an attribute which specifies unidirectionality, but isn't so efficient (it uses a more complicated select involving union all) to retrieve all birectional links to a sepecific object.

> It must be a Defence Department project.

No. So, has *this* told you enough about the data??

> Still, I daresay you can all get round your crappy database designs with lots of lovely application programming.

I really can see that the database design greatly impacts the application (not to mention the data's reusabiity in other applications): that is why I'm trying to get the database design right!

Option 3) is somewhat attractive because I'm already using it for modeling another type of relationship ("all these document sections have some common attribute"); however, I do believe that I probably need either 1 or 2 as well, because 3 seems so incapable of defining any directionality (e.g. "people who like A also like B, but not necessarily vice versa").

In conclusion I think that 1) ...

  Table Links
  ID1  ID2
  1    2    //defines a link between the 'foo' object and the 'bar' object

... and modified 2) ...

  Table Links
  ID1  ID2  LinkId
  1    2    1    //a link from 'foo' to 'bar'
  2    1    1    //and a link from 'bar' to 'foo'

... are pretty similar: they're equally able to model the data; and choosing one or another is an implementation detail (that affects the performance of selects and inserts, and the implementation of the constraint).
Christopher Wells Send private email
Sunday, July 09, 2006
 
 
Database fascist,

It is ludicrous to suggest that for the example given, it
makes any difference at all whether the links represent
electron flows in a circuit or train timetables.

It was a simple graph problem, and the solution was the
best way to represent a graph relationally.

Besides that, someone with your obviously displayed
ignorance of databases has no business telling us whether
or not we should be involved in data design.

Nor may someone who cannot understand a simple nested
query (as you have demonstrated) comment on the quality
of other peoples' databases sight unseen.

Not only do you not know what you are talking about, you
have a go at other people who do. Have you no shame?
You'll have to etc etc
Sunday, July 09, 2006
 
 
Depending on the application (which is why posting actual problem domain requirements can be so useful), when I see bi-directional links, I tend to start thinking whether equivalence classes are a relevant design pattern here (they might be anonymous, or one item might be the primary, and all the others might be equivalent).
Cade Roux Send private email
Sunday, July 09, 2006
 
 
Good, some information about what actual data this database will contain.

Don't know what the structure of your documents is, but let's say the lowest level is an 'article' and this is where the actual text is.

Let's say article A is 'linked' to article B. Why is it linked? Because you say so? Presumably you have a reason for saying so. Reasons I can think of are:

The articles are by the same author(s).
They concern the same subject(s)
The are drawn from the same publication (e.g. newspaper)
The authors are from the same organisation (university, newspaper, business)
They are in the same language.

These are all things which can be stored as data in tables, related appropriately, and queried.

Let's say your database is a success, and grows. You enter a new article. Are you going to be able to remember the contents of another 100,  1000, 10,000, million articles, so that you can create the records in you 'links' table? By this stage you might have assistants. Will they be able to remember?

If you store the right information about each article as it's entered then your so called 'links' are implicit in the structure of the database.

What helpful information will you give to the user?

'Here are another 1000 articles that you might be interested in'. That doesn't seem that useful to me.

How much better to have:

'Other articles by the same author'...
'Other articles about xyz'...
'Other articles about xyz published in Time'...

So maybe a couple of tables as examples:

Articles:
ArticlesID PK
ArticleText
MainSubjectID FK to Subjects table
OriginalPublicationID FK to Publications table

MinorArticleSubject:
ArticleID (Composite PK) FK to Articles table
MinorSubjectID (Composite PK) FK to Subjects table
Database fascist.
Monday, July 10, 2006
 
 
Oh yes, sorry, missed your '50% of people who like...' section.

Maybe a Likes table:
ArticleID
UserID

Looking at your response, you're going to end up doing it my way anyway. Because even at the planning stage you're talking about 'decorating' the links table. Whatever 'decorate' means. But it's probably going to mean adding columns. One by one, over time, as yet another reason for 2 articles to be 'linked' comes to mind. Which will be bad, in the long run. Better to add a new table.
Database fascist.
Monday, July 10, 2006
 
 
"It is ludicrous to suggest that for the example given, it
makes any difference at all whether the links represent
electron flows in a circuit or train timetables."

You're wrong. The informations matters. It's that simple.
Database fascist
Monday, July 10, 2006
 
 
'You'll have to look up the exact syntax and test it' wrote...

"Nor may someone who cannot understand a simple nested
query (as you have demonstrated) comment on the quality
of other peoples' databases sight unseen."

OK, you do what your signature says. Find out the correct syntax for your check constraint and implement it. Then tell me which of the major RDBMS products it works in.

Just to remind you, it was this:

check constraint
where not exists (select a.ID1, a.ID2 from sameTable a
where not exists (select b.ID1, b.ID2 from sameTable b
where a.ID1 = b.ID2 and a.ID2 = b.ID1)
)

I look forward to your reply.
Database fascist
Monday, July 10, 2006
 
 
Ah Fascist, you get the information you want and then immediately ignore the subtlety of it.

"Let's say article A is 'linked' to article B."

His example was a subsection of article A with another subsection of artible A.

"Are you going to be able to remember the contents of another 100,  1000, 10,000, million articles"

Clearly, he only has to deal with the contents of one article -- at least based on his example.  I assume he would like between other articles as well.  Even a sucess probably doesn't mean 1,000 articles -- that's a lot of articles.
Almost H. Anonymous Send private email
Monday, July 10, 2006
 
 
> Are you going to be able to remember the contents of another 100,  1000, 10,000, million articles, so that you can create the records in you 'links' table?

As far as I know, that *is* the way a Wiki works: I create an article on a new topic, and then I link to this new article from one or more other existing articles that I think may be related somehow.

> If you store the right information about each article as it's entered then your so called 'links' are implicit in the structure of the database.

The two traditional methods are "Table of Contents" at the front of a book and "Index" at the back. Neither is especially good at telling you that the "Cars" topic is somehow related to "Roads".

> MainSubjectID FK to Subjects table

So you like what I have called option #3: if I want to link my new "Cars" article to an existing "Roads" aticle then I should identify a common subject or set of sujects which relates them (e.g. "Driving", "Transportation", "Urban living", "Travel", etc).

It doesn't seem to me that Wikipeia is organized that way (not that it's stored in a dataase): look at http://en.wikipedia.org/wiki/Road and http://en.wikipedia.org/w/index.php?title=Special:Whatlinkshere/Road&limit=500 ... each link is just a link, not decorated by a "Subject".

Wikepedia's links are directional, though: "mentions/is mentioned in".

> 'Here are another 1000 articles that you might be interested in'. That doesn't seem that useful to me.

That's what's good and bad about defining a "Subject" to justify and to contain each link. It's good because roads and cars are related only via high-level abstract subjects that include 100s of other topics including airplanes, engines, wheels, ships, stop signs, etc... either that, or I define a new subject "Cars and Roads" that's customized for this link. Your requiring explicit classification forces the author of the link to understand that and to build a really comprehensive general-purpose index. OTOH it makes it difficult for the author to simply decide that he simply wants to draw the attention of his readers to some other aticles which *he* thinks are related, without needing to explain or justify his reasoning for his linking, and without his trying to relate his article to all other articles in the universe.
Christopher Wells Send private email
Monday, July 10, 2006
 
 
After all the artcles recently about how T-SQL is the devils work and DBA's should just shut up and look after the database, this entire thread has really got me wound up. The entire thread could've been:

OP: I have a table which contains objects:

Table Objects
ID Otherstuff
1  foo
2  bar
3  etc

I want to create...

Helpful poster: Are all of your links guaranteed to be bi-directional, i.e. does A links to B infer that B links to A?

OP: No, there might be some cases where A links to B, but B doesn't link to A

HP: Then you'll have to treat each link as directional and seperate, so that you have:

source  target
A        B
B        A

and create them seperately as required. Referential integrety will only apply to one side of the link, so you can remove the A to B link and leave the B to A in place without breaking the logic of your model.

Hope that helps.

OP: Cheers, I'll give that a try and let you know how I get on.


What the data is, who is using it, when they were born, who they work for and what they had for breakfast is utterly and completely bloody irrelevant. This is a question of data structure, pure and simple, and the OP was politely and honestly asking for help. Every DBA who reads this board could've shown that they were better people than all the "DB's are just for data and my app makes all the decisions" idiots, but no. Some fool had to show that he could be just as moronic as them and do the same in reverse.

Database fascist - stop it. You're making SQL Jesus cry.
Paul Brown Send private email
Monday, July 10, 2006
 
 
Sorry to post in response to myself, but I've just re-read my post and realised that, in my immense ranting anger, I've accidentally been quite insulting to Karl Perry, Tim Williams, AHA, the fella with the really long "You'll have to..." name and others who have done exactly what I said and tried to be helpful and informative and professional.

My apologies to you all, my anger was not in any way directed at you and any ofense caused was entirely unintentional.
Paul Brown Send private email
Monday, July 10, 2006
 
 
Looks like you've got your answer. Just use the software that Wikipedia uses, it seems to be all open source.

Monday, July 10, 2006
 
 
Thank you.
Christopher Wells Send private email
Wednesday, July 12, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz