The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Database schema for availability queries

Ok, here's the quandary.  We have a database with 500k property listings for short-term rentals.  Users of our service can browse through those properties and filter them based on various criteria.  The issue at hand is that people only want to see properties that are available for their date range (i.e. Aug 10-16).

I have all the availability data for the properties, so the question is how to store it in the DB so that these queries perform well on a set of 500k-1M properties.  All other filters can currently be done without joins.  Obviously this will be the exception.  But it is a primary use case so I'd like to come up with a solution that will scale.
Mike Giles Send private email
Thursday, June 12, 2008
That's a slightly interesting problem. Maybe when they say "show me anything available Aug 10-16" it would be reasonable to require use of another filter, such as "in Los Angeles", that would cut the number of records you need to filter through down to a reasonable size?
Greg Send private email
Thursday, June 12, 2008
Yes, it is safe to assume that there won't be a lot of availability queries on the entire data set.  Not sure if I'd make it explicitly illegal (people like to browse around the various geographies), but it shouldn't be too common.
Mike Giles Send private email
Thursday, June 12, 2008
- this doesn't need a join - just add two fields to your property table [AvailableStart] and [AvailableEnd]
- [AvailableEnd] can be NULL if no end date specified

query for all properties available within a date range

WHERE [AvailableStart] <= [RequestedStart]
AND ([AvailableEnd] >= [RequestedEnd] OR [AvailableEnd] IS NULL)

This should be fine if you index the date fields
DJ Send private email
Thursday, June 12, 2008
You might want to look here - an entire book about temporal data management in sql:
Chris Tavares Send private email
Thursday, June 12, 2008
How are you storing the "occupied" data?  Something like:

table property_availability {
  property_id int PK,
  date date PK,
  available boolean

Synodontis Send private email
Thursday, June 12, 2008
"so the question is how to store it in the DB"

Just some quick idea's;

You could add a table containing only the availability dates and a reference to the property. Sorting and filtering will be relatively quick once you add an index.

Avoid joins if you can; you can lookup the details by using the reference, once you got the selection. You might also want to limit the results to, say a 1000. I doubt that you'll need more records in a single 'pull'; you might be able to retrieve the next 1000 whilst the user browses the first 1000.

You could also precalculate availability; Assuming that a week would be a often-sought timeperiod - create a table called "availabilityForWeek48", and fill it with reference-id's from the property-table that are available in week 48.

Another option would be to use a dedicated piece of hardware for your instance of SQL Server.

Last suggestion: take the time to build some prototypes and measure the performance.

Happy Programming,
Eddy Vluggen Send private email
Thursday, June 12, 2008
DJ - If only it were that simple.  Each property can have many different unavailable periods (e.g. first two weeks of June, all of July, third week of August, first weekend in October, etc).

Chris - Thanks for the link.  I'll check it out and see if it pertains to this space (there are a lot of different temporal data challenges).

Synodontis - Right now I'm storing it in a format that makes it relatively compact and easy to render into a calendar.  But that's purely a stop-gap measure until the querying capability gets added.  So I'm trying to determine the appropriate schema for that task.  I was thinking there must be a better way than having one row per property per day.  But maybe not.

Eddy - I kind of like the precalculate concept.  Not sure if it would work in the wild.  And yes, I'm at the stage of testing out different approaches.  Just curious if there were any best practices that I didn't know about.
Mike Giles Send private email
Thursday, June 12, 2008
Do any DBs support bitwise operations in where clauses?  It's not something I've ever tried, but it could be another option.

table property {
 property_id int PK,
 available2008 bit(356),
 available2009 bit(356)
Synodontis Send private email
Thursday, June 12, 2008
For a only a few million items you should be able to get good performance on just about any, non-boneheaded (i.e. no bubble sorts) algorithms. All the data will fit in memory on any machine made in the past decade, and you should be able to run through the entire data set in no more than a few seconds (at most).

Before you start worrying about performance, get a working system in place, then you can see how it performs and where the bottlenecks are. Premature optimization is the root of all evil.
Jeffrey Dutky Send private email
Thursday, June 12, 2008
I think you will have more queries than updates...
One Idea is to store the availability this way:

Whenever the Occupation of a unit is changed, another "Availability" table is updated (Materialized Views in Oracle would be great for this)...

How many days does a query usually need, if it's a few days <10, I would store each available date alone and query on a day-by-day basis with the and operator....

The bitwise and idea is also nice, never did this in a db, you need a single bitwise and to return nonzero for every row that matches.

I don't want to think about map-reduce on tens of thousands of server but can't get it out of my head.
Totally Agreeing
Friday, June 13, 2008
What's wrong with joins?
Walter Mitty Send private email
Friday, June 13, 2008
I didn't say that joins are wrong :)

Joining two tables together takes time. Why would you want to use two tables if one is sufficient?
Eddy Vluggen Send private email
Friday, June 13, 2008
Because a schema which requires joins results from normalising the database: which saves space, promotes correctness in the data, and makes the data cheaper to insert or update.
Christopher Wells Send private email
Friday, June 13, 2008
One doesn't blindly follow a method or best-practice. Sometimes you'll want a non-normalized version that is optimized for performance.
Eddy Vluggen Send private email
Friday, June 13, 2008
Store the dates as Julien Dates (intergers or longs) and index them.
Hail Caesar
Friday, June 13, 2008
+1 to Chris Tavares for an excellent book recommendation.

Everyone doing database design should read this book, imho. Even if you don't employ the techniques in the book, it is useful to know that they exist.
Odysseus Send private email
Friday, June 13, 2008
How much time does it take to join two tables together?
Walter Mitty Send private email
Saturday, June 14, 2008
That doesn't only depend on the size of the tables, but also on the time of the day. If you wake the records too early, they'll be cranky.
Eddy Vluggen Send private email
Saturday, June 14, 2008
(add other important/often used fileds here)

15,1,"20080101", "20081231"

SQL query:
SELECT FROM avail WHERE AvaliFromDate <= "20080801" AND AvailToDate <= "20080815"

If the booking is confirmed, split the avail record:
15,1,"20080101", "20080731"
15,1,"20080816", "20081231"
Francesco Send private email
Thursday, July 03, 2008

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz