A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
Between the recent posts on this forum about GMail labels, and the tag system in Flickr and Del.icio.us (see links below), this system of tags/labels/social classification is recieving some new attention.
Its great to see so much discussion on the topic. But as a developer, I'm curious about the implementation details. This has the possibility of becoming a new programming paradigm; instead of structuring complicated SQL queries, you could issue search queries against a group of terms. Search and collaborative filtering seem to take center stage.
So how are systems of the future going to handle this? Large text fields and search? How is that going to scale? What about tag normalization issues?
Side-note: I wouldn't characterize tags as a new paradigm. I mean, you can think of them as predicates, witch falls within 1-order logic and is handled by prolog for many years (decades?) now.
Rafael de F. Ferreira
Wednesday, September 08, 2004
What goes around comes around, no? Flickr applies tags to photos, GMail to email, and Del.icio.us to links; but in theory they can be applied to all data. In fact, that was/is one of the major points of WinFS' directory structure.
But look at 99% of popular applications out there and they are standard queries-against-the-database type apps. To me, when data is retrieved via tags, this standard method of database access falls short. So what are our options? If prolog handles this, how does it do it, and how does it scale?
Companies like Microsoft and Google have money to throw at this, but what about the independent developer? I know there are open source apps like Lucene for search, but that's a search index thats separated from data. Maybe what we need is a database who's main method of data retrieval is text search.
C'mon people, Dream with me here!
I suspect implementing tags is not a big challenge. A usable design may be hard, but technically and performance-wise there are no major challenges.
Consider this simplified DB schema:
(MailID, Subject, From, To, Message)
(TagID, Name) // use TagID as PK, so spelling fixes can be made to name
Now to find all mail with tag "xxx", we first get the id for tag "xxx": let's say, 7
JOIN MailTag ON Mail.MailID=MailTag.MailID
A little bit of time making sure your indices are in order... and we're done.
> If prolog handles this, how does it do it, and how does it scale?
Oftentimes prolog does not scale. My programming languages professor once boasted in class that he could write an entire compiler in a page or two of prolog. I asked the compiler prof, jokingly, why the compiler group didn't just use prolog for their compilers. "Because we want them to run in better than exponential time," he replied.
don't fear the leaper
Friday, September 10, 2004
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz