The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Cross DB Vendor queries

A client is asking us to create a product that, among other things, will perform queries that join tables from both SQL Server and Oracle databases. Anybody have any insight, besides "why would you do that?"

Thanks!
Steve Hirsch Send private email
Wednesday, January 31, 2007
 
 
That sucks! :-)

I think it would be crucial to have an in-memory database that would join the data for you. The tables could be temporary for each operation. The best optimization would be to bring only the minimum/necessary data from each database into the in-memory one, and then make the last join from there.

The in-memory database should better support SQL as well.

Java has some in-memory database options with Derby, HSQL, and others.

For .NET, I don't know what would be the most convenient way to do this.
Lostacular
Wednesday, January 31, 2007
 
 
sp_addlinkedserver and sp_addlinkedsrvlogin are the sql server stored procs that will get you in that direction. This way, your joins can be cross servers not just cross db.

You may have some joins with things like:  server1.northwind.dbo.users

When doing cross server joins, be ready for lots of doubleplusunfun times. Joins in one direction may take a few minutes, joins in the other direction may take hours.

Select blah from sales left join server1.northwind.dbo.user

versus

select blah from user left join server2.southtwind.dbo.sales

one may take minutes. One may take hours. Or they might be similar.

I had to do reporting type queries like this that went on for 2 pages. The english wordy type description was "tell me all the doctors who did NOT send a prescription to pharmacy_chain_X in the past month." In this case, some of the doctors had their electronic prescription program paid for by that pharmacy chain, so the chain wanted to know who they were paying for that wasn't using the service. We had several million rows in the prescription table (about 1M prescriptions added per month) on one server and tens of thousands of doctors in the user table on a different server. Joins one way took 15 min. Joins the other way took more than 4 hours. Lots and lots of COALESCE statements.
Peter
Wednesday, January 31, 2007
 
 
ColdFusion does this already - just use that =).

Eric
Eric Knipp Send private email
Wednesday, January 31, 2007
 
 
+1 for Loastacular's suggestion to copy the needed data to do the joins. 

Our product does something similar (all in SQL Server, but using multiple db instances on physically separate boxes).  For us, it's faster to copy the needed data, sometimes millions of rows, to one db & do the join/report from there, then flush the copied data.

Don't know what the current implications of doing that between Oracle & SQL Server are, though.  Last time I had to do that was 10 or so years ago and at the time some of the datatype conversions were tough.
a former big-fiver Send private email
Wednesday, January 31, 2007
 
 
Linked servers works and is usually fine for small amounts of data on the "foreign" side.  This is usually how we determine which side is going to be the "local" side in the first place.

For large amounts of data on both sides, the only solution which really performs is to copy the data, which is easy enough with DTS or BCP.  Techniques used to reduce that copy time are to vertically or horizontally slim the data on the way so you are only copying rows (and columns) which you need to check in the first place.  If those criteria are in the source query, SQL server will push them to the foreign server to reduce the data pulled across.  If you are comparing to summarized data, pull the already denormalized data from the "foreign" server and only store this on the "local" server.

This technique can also be used to optimize the linked server query, but there's a limit to what the "local" server can know about some of the "foreign" data without actually pulling over to see it to compare to the "local" data.
Cade Roux Send private email
Wednesday, January 31, 2007
 
 
"tell me all the doctors who did NOT send a prescription to pharmacy_chain_X in the past month."

Peter, that is a great concrete example, thanks! You have prescriptions and doctors in separate database servers. The only efficient way would be make a temporary copy of a key aspect of one dataset onto the other server. I would guess the lowest impact would be a copy of the set of Doctor IDs to the prescription side for the join. Is that transfer what takes the majority of the 15 minutes? But then the report probably wants additional doctor details for the final output... Whether the SQL can be designed to do it the most efficient way is a good question.
Ben Bryant
Wednesday, January 31, 2007
 
 
Thanks for all the great answers...I guess I should have said up front that I've got a lot of experience with Oracle, and basically squat with SQL. Therefore, my bias is to look at Oracle first, but these comments made me realize that I need to look at where the bulk of the data is going to live first...

Anyways, I did find Oracle's Gateway thingies. Anybody here have any experience with it? (:=)
Steve Hirsch Send private email
Wednesday, January 31, 2007
 
 
"Anybody have any insight, besides 'why would you do that?'"

You'll want to create a middlware server that's on same LAN as db's.  I haven't done any cross database work, but I do know my n-tier framework (kbmMW) supports "two phase commit" transactions across heterogenous databases.  It does things in memory tables in the middleware server, as several others have suggested.  Memory table joining can be done manually, or there's a separate commercial Delphi tool (TxQuery) that lets you do a SQL queries that join memory tables.

kbmMW is native Delphi and .NET, although there is support for interaction with Java (sorry, no experience with that, either).  I do know the lead developer, Kim Madsen, does do quite a bit of work with Java.  Might be worth checking with him if you have any interest: http://www.components4developers.com

RemObjects/DataAbstract is another Delphi/.NET solution similar to kbmMW.  Not sure if it supports two phase commits (I don't think it does) but depending on what you're doing could be worth checking on.
http://www.remobjects.com

These are just a couple ideas.  If you're need completely Java-based solution it might be worth checking around for projects analogous to kbmMW or RO/DA.
Herbert Sitz Send private email
Friday, February 02, 2007
 
 
Linked Servers to add remote database to SQL SERVER. eg. SQL SERVER to ORACLE.
Paradigm Developer
Thursday, February 08, 2007
 
 
I haven't been able to get the Oracle Gateways to work. They seem so difficult that I want to forget about it, since it is for a product and I don't want to require that kind of knowledge.

That said, I'm a SQL newbie. How does one refer to a table on a linked server? Haven't been able to figure it out from the documentation.

Thanks.
Steve Hirsch Send private email
Friday, February 09, 2007
 
 
This is like creating your own database engine. I don't think it is a good idea. You should ask if you need a mechanism where information from one database can be linked to another database. <pitch> my product InfoCaptor, allows you to create information windows from different database sources and link them visually in the same page (you may call it dashboard/portal page etc). you can drill from information from SQL server to information in Oracle database and so on..you can find more information here http://www.infocaptor.com/demo/infocaptor_introduction.htm

let me know if you have questions

</pitch>
BI Dashboards Send private email
Friday, February 16, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz