The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

implementing all this talk of tags...

Between the recent posts on this forum about GMail labels, and the tag system in Flickr and (see links below), this system of tags/labels/social classification is recieving some new attention. 

Its great to see so much discussion on the topic.  But as a developer, I'm curious about the implementation details.  This has the possibility of becoming a new programming paradigm; instead of structuring complicated SQL queries, you could issue search queries against a group of terms.  Search and collaborative filtering seem to take center stage.

So how are systems of the future going to handle this?  Large text fields and search?  How is that going to scale?  What about tag normalization issues?

Some links:
monsur Send private email
Tuesday, September 07, 2004
Side-note: I wouldn't characterize tags as a new paradigm. I mean, you can think of them as predicates, witch falls within 1-order logic and is handled by prolog for many years (decades?) now.
Rafael de F. Ferreira Send private email
Wednesday, September 08, 2004
What goes around comes around, no?  Flickr applies tags to photos, GMail to email, and to links; but in theory they can be applied to all data.  In fact, that was/is one of the major points of WinFS' directory structure.

But look at 99% of popular applications out there and they are standard queries-against-the-database type apps.  To me, when data is retrieved via tags, this standard method of database access falls short.  So what are our options?  If prolog handles this, how does it do it, and how does it scale? 

Companies like Microsoft and Google have money to throw at this, but what about the independent developer?  I know there are open source apps like Lucene for search, but that's a search index thats separated from data.  Maybe what we need is a database who's main method of data retrieval is text search.

C'mon people, Dream with me here!
monsur Send private email
Thursday, September 09, 2004
I suspect implementing tags is not a big challenge. A usable design may be hard, but technically and performance-wise there are no major challenges.

Consider this simplified DB schema:

table: Mail
(MailID, Subject, From, To, Message)

table: Tag
(TagID, Name) // use TagID as PK, so spelling fixes can be made to name

table: MailTag
(MailID, TagID)

Now to find all mail with tag "xxx", we first get the id for tag "xxx": let's say, 7

JOIN MailTag ON Mail.MailID=MailTag.MailID

A little bit of time making sure your indices are in order... and we're done.
Herr Herr Send private email
Thursday, September 09, 2004
> If prolog handles this, how does it do it, and how does it scale? 

Oftentimes prolog does not scale.  My programming languages professor once boasted in class that he could write an entire compiler in a page or two of prolog.  I asked the compiler prof, jokingly, why the compiler group didn't just use prolog for their compilers.  "Because we want them to run in better than exponential time," he replied.
don't fear the leaper
Friday, September 10, 2004

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz