A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
Ok, here's the quandary. We have a database with 500k property listings for short-term rentals. Users of our service can browse through those properties and filter them based on various criteria. The issue at hand is that people only want to see properties that are available for their date range (i.e. Aug 10-16).
I have all the availability data for the properties, so the question is how to store it in the DB so that these queries perform well on a set of 500k-1M properties. All other filters can currently be done without joins. Obviously this will be the exception. But it is a primary use case so I'd like to come up with a solution that will scale.
That's a slightly interesting problem. Maybe when they say "show me anything available Aug 10-16" it would be reasonable to require use of another filter, such as "in Los Angeles", that would cut the number of records you need to filter through down to a reasonable size?
- this doesn't need a join - just add two fields to your property table [AvailableStart] and [AvailableEnd]
- [AvailableEnd] can be NULL if no end date specified
query for all properties available within a date range
SELECT FROM Property
WHERE [AvailableStart] <= [RequestedStart]
AND ([AvailableEnd] >= [RequestedEnd] OR [AvailableEnd] IS NULL)
This should be fine if you index the date fields
Thursday, June 12, 2008
"so the question is how to store it in the DB"
Just some quick idea's;
You could add a table containing only the availability dates and a reference to the property. Sorting and filtering will be relatively quick once you add an index.
Avoid joins if you can; you can lookup the details by using the reference, once you got the selection. You might also want to limit the results to, say a 1000. I doubt that you'll need more records in a single 'pull'; you might be able to retrieve the next 1000 whilst the user browses the first 1000.
You could also precalculate availability; Assuming that a week would be a often-sought timeperiod - create a table called "availabilityForWeek48", and fill it with reference-id's from the property-table that are available in week 48.
Another option would be to use a dedicated piece of hardware for your instance of SQL Server.
Last suggestion: take the time to build some prototypes and measure the performance.
DJ - If only it were that simple. Each property can have many different unavailable periods (e.g. first two weeks of June, all of July, third week of August, first weekend in October, etc).
Chris - Thanks for the link. I'll check it out and see if it pertains to this space (there are a lot of different temporal data challenges).
Synodontis - Right now I'm storing it in a format that makes it relatively compact and easy to render into a calendar. But that's purely a stop-gap measure until the querying capability gets added. So I'm trying to determine the appropriate schema for that task. I was thinking there must be a better way than having one row per property per day. But maybe not.
Eddy - I kind of like the precalculate concept. Not sure if it would work in the wild. And yes, I'm at the stage of testing out different approaches. Just curious if there were any best practices that I didn't know about.
For a only a few million items you should be able to get good performance on just about any, non-boneheaded (i.e. no bubble sorts) algorithms. All the data will fit in memory on any machine made in the past decade, and you should be able to run through the entire data set in no more than a few seconds (at most).
Before you start worrying about performance, get a working system in place, then you can see how it performs and where the bottlenecks are. Premature optimization is the root of all evil.
Thursday, June 12, 2008
I think you will have more queries than updates...
One Idea is to store the availability this way:
Whenever the Occupation of a unit is changed, another "Availability" table is updated (Materialized Views in Oracle would be great for this)...
How many days does a query usually need, if it's a few days <10, I would store each available date alone and query on a day-by-day basis with the and operator....
The bitwise and idea is also nice, never did this in a db, you need a single bitwise and to return nonzero for every row that matches.
I don't want to think about map-reduce on tens of thousands of server but can't get it out of my head.
Friday, June 13, 2008
Store the dates as Julien Dates (intergers or longs) and index them.
Friday, June 13, 2008
(add other important/often used fileds here)
SELECT FROM avail WHERE AvaliFromDate <= "20080801" AND AvailToDate <= "20080815"
If the booking is confirmed, split the avail record:
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz