The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Strategies for copying relational data.

I am adding a “paste” function in which a user selects something (in this case a tour) to copy.

The user will then select “paste” to copy that data to a particular date on a outlook style calendar.

Does anyone have any suggestions on how they attack the problem of copying relational data structures?

Obviously, that ONE tour is a fairly complex mix of several tables that represent that one tour.

When I just have to copy one “level” of related data, then I use some code to make this easy:

I use:
Call RelationDataCopy(strTableName, strFKFieldName, lngFKfromValue, lngFKToValue)
 
So, the above routine will do a data copy for “one” table of data. So, I have code like:

' copy all stops (tour routes) for this tour
Call RelationDataCopy("tblTourRoute",
                              "tblBusTour_id",
                              clsTour.BusTourID,
                              clsbookinfo.m_lngBusTourID)

The problem is that for each tblTourRoute in the above, I also have records below that I need to copy. (and believe it or not…we are already inside of a loop copying records from a table above ).

Looking at this problem, it could be solved via a recursive routine if I supply the data structures, say like:

table Name 1….N
FK Field    1….N

So, I would create a “list” of all child tables, and FK fields, and then call the routine with this “list” of tables + FK, and away I go. Either I do that, or write some more code loops to copy this data.

I am holding off on this routine since I don’t think it will perform too well….

What do most of you folks use when you have to copy relational data, say, 2, or perhaps 3 levels deep. (so, one main record…many child records, and for each child record..also some many records).

I mean, 1 main record, and some child records is easy. However, when you go more then 1 level deep, it starts to get messy code wise.

Do you folks just plow through this…and write more code???

I thinking the idea of making a “general” routine makes sense…just looking for some possible ideas before I go and write this recursive routine….

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
Albert D. Kallal Send private email
Saturday, February 17, 2007
 
 
Sorry, but I don't think that making a "general" data access routine makes a lot of sense for this case.

This would be simplified if you just added one more level of abstraction.  Have a Tour object with a DeepCopy method.  The DeepCopy method simply copies its own data and then calls DeppCopy on all of its child objects.  If there are more levels, the child level objects would take care of copying them.

I think a lot of people get the data access layer correct, but then treat that layer as if it were the business layer and call it from code as such.  A real business layer would make this problem trivial.

You also seem hesitant to write the code that needs to be written for "performance" reasons.  If 1000 rows need to be copied, then there is no way you are going to write any code that inserts less than 1000 rows and still works.  If you do run into performance problems, there are a lot of SQL shortcuts you can take with INSERT...SELECT, but I wouldn't worry about that until after the problems arise.  There will be a pretty big maintenance cost for collapsing the DeepCopy into one method call.  You don't want to write that check until you know you need to do so.
JSmith Send private email
Saturday, February 17, 2007
 
 
Are you certain that copying is the correct behavior?  From your description of the problem, it sounds like you are creating an association between a Calendar element and a Tour.
Ben Send private email
Saturday, February 17, 2007
 
 
Ben:

Actually, the user is making a copy of a tour.  There is a considerable number of things to copy

tour stops,
pickup, drop-off times,
Tour Guides,
contact people at EACH stop,
 “many” Meal and drink specials for each stop,
etc. etc etc.

So, I want the user to be able to select this existing tour (perhaps one they run on Saturday for example), and then they bring up the calendar, and are able to paste this tour over and over to whatever other dates they want this SIMULAR tour to occur. So, there is several levels of related tables data that needs to be copied.

To MOVE that huge complex tour with the zillions of things actually is just date change, and due to normalized data…everything else moves. So, moving, yes, you are right, it is just actually a date field change (or a to un-book, and re-book…it is a attach that is the result of a good relational design). However, this is an actual copy of a set of tables (related data) to a new tour.

JSmith:
> The DeepCopy method simply copies its own data and then calls DeppCopy on all of its child objects.

The problem is two fold however. First, I don’t have serialized objects.  2nd, this is a vb class object – no inheritance either. However, a deepcopy method of the object would still actually require me to write sql statements for each related table. It is not a copy of in-memory data in an object, but a set of tables + related data.

Further, not all of the related data is to be copied. So, I have to specify which tables will be copied (hence, my “list” idea). For example, booked people and booked groups are not to be copied, but only the data that actually defines the tour.

> You also seem hesitant to write the code that needs to be written for "performance" reasons.  If 1000 rows need to be copied,

Actually, it not near that many rows at all. The problem is not the number of rows, but I want to minimize the number of sql statements executed. (one sql statement can move a HUGE amount of data very quickly). However, if you start executing lots of sql statements to move tiny bits of data, then you can you hurt performance.

If I write the code recursive, then I will actually wind up executing a sql statement for each record that needs child records copied. (an object does not solve this problem – since it would force me to load all that child data anyway…I don’t need to pull that data into a object..but rather have sql copy it for me).

If I dump the recursion, and use some joins, I can often eliminate the number of sql statements executed by a large number. (but, that solution would not be recursive, and require more messy code -- that where my hesitation is coming from). 

Since this operation is going to occur during a quick “paste” operation, then I don’t want any type of noticeable delay. The user is going to point and click paste this tour over and over quite quickly while seeing the tour appear in the calendar. So, speed of copying many records is not at issue, but executing too many separate sql statements without a noticeable delay is a big concern.

Perhaps I am being too concerned. But, I want to keep delays to an absolute min during this operation since it occurs interactively. It will be almost like checking boxes on a screen, or pasting text into a document…it must occur fast.

I will likely go ahead and write my recursive routine. (it is not much code at all). I was just wondering as to what people use to copy several levels deep of sql table data. (with, or without an object).

I do note well that if this was a in-memory data object, then copying would be easy with a object.

However, we really talking about several layers of data in tables to be copied.  I am going to just make it work, and see how fast this runs. I fix any perforamcne issues later if they come up.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
Albert D. Kallal Send private email
Saturday, February 17, 2007
 
 
To copy/insert the top-level record, I think you want something like ..

INSERT into TableA
{ all TableA fields except recordId }
SELECT
{ all TableA fields except recordId }
FROM TableA
WHERE (TableA.recordId = %recordId%)

To copy/insert child-level records ...

INSERT into TableB
{ all TableB fields except recordId }
SELECT
{ all TableB fields except recordId }
FROM TableB
WHERE (TableB.parentId = %recordId%)

To copy/insert grandchild-level records ...

INSERT into TableC
{ all TableC fields except recordId }
SELECT
{ all TableC fields except recordId }
FROM TableC JOIN TableB
ON (TableC.parentId = TableB.recordId)
WHERE (TableB.parentId = %recordId%)

This would be one SQL statement per table, and no loops; I don't see what the problem is?
Christopher Wells Send private email
Sunday, February 18, 2007
 
 
I've used a recursive routine to do something similar.  I haven't even thought about the speed, highest number of recursions is less than 10 or 20.  In my case it would be a practical impossibility to write static code because the recursion is done with user-defined query (e.g. subset of fields) for "master" and "children" are also user-defined queries, and sometimes not all child entities are requested.  But even with tables and always returning all child data benefits of recursive strategy may well outweigh hard-coding everything.  I don't see speed being an issue for relatively low number of recursions you seem to be talking about.
Herbert Sitz Send private email
Sunday, February 18, 2007
 
 
"Further, not all of the related data is to be copied. So, I have to specify which tables will be copied (hence, my “list” idea)."

Albert, yep, that's a little similar to what I do.  And I create a list to do it also.  If I recall from the original list (of master's children) I did a recursion to build up a list of all sql statements I needed executed. No queries run up through this point.  Then I just sent that second list to server, run a for/next loop on it that executes sql in each item, and send all the datasets back as single package.
Herbert Sitz Send private email
Sunday, February 18, 2007
 
 
Hey, if you want to put performance concerns above maintenance concerns before you even have a performance problem, be my guest.  Have fun.  Maybe you should write the while thing in assembly.
JSmith Send private email
Sunday, February 18, 2007
 
 
JSmith:

Hey, take a pill here. I just asking for some ideas. Just because I kindly noted that your idea is un-workable, and generally a poor approach to this problem is no need for you to get all twisted up. I am just looking for some ideas here.

I am not looking to write this in assembler (in fact it been a number of years since I written any assembler if you are wondering).

Your idea of adding copy method to the object is just not workable in a production environment. This is especially so since objects don’t normally map their data to sql structures very well anyway.  And, further, OFTEN that data will not be in a object.

I simply have some relational data to copy. The assumption that this relational data will even be in a object is not a reasonable assumption in a production environment.

Virtually every developer has had to copy relatonal data at some point in time. I am hard pressed to think of any application I written where I NOT had to copy relational data.

Christopher:
>You don’t need loops

I agree. That is exactly my quote here:

>>If I dump the recursion, and use some joins, I can often eliminate the number of sql statements executed by a large number

I was referring to executing sql statements in a loop (since that is how updates are done in sql). So, yes, you can trade looping for some joins, and then the number of sql statements executed goes down.  (this exactly what I was saying). At least you understand the problem!!!

However, I looking to not have to write the sql at all!!!

Herbert:
I've used a recursive routine to do something similar.  I haven't even thought about the speed,

Yea, I don’t think perforamne will be too much of a problem.

However, what I am trying to eliminate the need to write the sql each time in the first place!!!

My recursive routine allows me to just specify the tables, FK, and top most record key. The routine it self actually does all of the work, and I don’t have to write ONE line of sql. You can see if we go a few levels deep, then the sql starts to get a bit messy.

So, every time in code I hit the need to copy a relational data structure, I have to start writing sql to do this, and I don’t want to!!
 
However, I certainly might re-write my recursive routine to actually create the sql as Christopher has shown. From a good performance point of view, that sql is the road I would take. It does eliminate my loops and is generally how relational data is copied.

I should note at this point that my recursive solution uses reocrdsets, because I am TOO lazy to write the actual sql. My reccrdsets know to skip the first field in the data copy because that is ALWAYS the PK value in my designs.

The whole recursive routine that does the copy is only about 20 lines of code. And, most important is that the field list is eliminated in the code (or sql).

As a developer, each time in code I come across a need to copy *relational* data, I now have a solution. Just pass the table name, FK name..and bam….I am done. Further, with my approach, if table structures change, the code will continue to work.

In actually, I am just being too lazy to write the sql. However, it seems that developers who are strategically lazy tend to look for solutions to these types of problems!!!

I was curious as to what, or how developers approach this problem, and it seems most don’t have a general solution for this common problem, but simply “plow” through this and write the several sql statements each time.


Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
Albert D. Kallal Send private email
Monday, February 19, 2007
 
 
Albert -- Maybe my explanation was a little confusing.  Though my recursive function outputs a list with sql for each table for which data will be fetched, it doesn't actually construct that sql.  That sql was actually pre-generated by user-defined query specs and persisted in a db "meta-query" table  (The user defines these queries in entirely separate part of application and for other uses.) 

Basically all the recursive routine does is assemble the list of records in the meta-query table for which recordsets should be generated.  In any case, the fact that it returns list of SQL statements and not just a list of tablenames is irrelevant to the structure of the recursive function.

If you were copying all fields from the desired tables then all you'd need to return from recursive function would be list of tablenames.  That list could then be used to drive a (non-recursive) function that simply loops through list and does necessary processing on each of the tables in it. 

Sounds like what you're planning is somewhat similar to what I did.  I can confirm that the recursive method is fairly simple and does provide extra flexibility and simpler code maintenance over having a more straightforward hardcoded non-recursive solution.
Herbert Sitz Send private email
Monday, February 19, 2007
 
 
> At least you understand the problem!!! However, I looking to not have to write the sql at all!!! ... I was curious as to what, or how developers approach this problem, and it seems most don’t have a general solution for this common problem, but simply “plow” through this and write the several sql statements each time.

Albert, I've started a framework to write my SQL. I define ...

* My types (=> tables)
* Fields/properties of each type (=> fields in each table)
* Relationships between types (childOf, treeOf, manyToMany, etc)

... and the framework generates CRUD methods with the corresponding SQL. The framework isn't finished yet though (its development suspended while I develop a different part of the project): so, of no immediate use to you.
Christopher Wells Send private email
Monday, February 19, 2007
 
 
Albert,

The problems with deep cloning (which is what you're trying to do) for persistent data is that:
- the primary keys (some may have to be generated).
- the correct foreign keys
- correct insert order
- deal with related exceptions.

You need a fair bit of meta data to describe the what's what and then quite a bit of code to use that data and solve the above problems correctly. To do it all auto-magically is, IMO, not wroth the coding and configuration effort.

Instead, I would just nail down an utility API to speed up the coding of the above. Just a hunch, but a recursive approach may actually work quite nicely.

At a very high level:
- starting from the root object (your tour in this case), traverse a tree of objects (has to be a tree for obvious reasons).
- Upon traversal of each node, create a command which contains insert/update instructions. Add this command to the head of a command list .
- After traversing the tree, run the commands in direct order (head->tail); that should be the correct insert order, since the tail refers to the root object.
Dino Send private email
Wednesday, February 21, 2007
 
 
> traverse a tree of objects (has to be a tree for obvious reasons)

[The tree of types or tables.]
Christopher Wells Send private email
Wednesday, February 21, 2007
 
 
Hi Albert,

One thing that might help your pursuit is if you could alter the table structure somewhat to make it more regular (and therefore subject to automatic processing). 

For example, if you moved all of the foreign keys into link tables (i.e. pretend every relationship is a many to many), then you have two kinds of tables: objects and links.  This means that to make a copy you first process all of the objects (in any order - a considerable simplification), and then you process the link tables, also in any order.  During the copy operations it is very easy to do the required bookkeeping in order to map all the keys and everything if you take this sort of approach as well.

Might be like trying to swat a fly with a bazooka to take an approach like this, but maybe not.
Franklin Send private email
Thursday, February 22, 2007
 
 
Actually, the code is written and deployed in the field already. It only took me about 10 minutes max to write the code. The main recursive loop is here:


  strFK = tableList(2, intLevel)
  strSql = "select * from " & tableList(1, intLevel) & _
            " where " & strFK & " = " & lngFromFK
 
  Set rstFrom = CurrentDb.OpenRecordset(strSql)
  Set rstTo = CurrentDb.OpenRecordset(tableList(1, intLevel))
 
  Do While rstFrom.EOF = False
      rstTo.AddNew
      For i = 1 To rstFrom.Fields.Count - 1
        If rstFrom.Fields(i).Name = strFK Then
            rstTo(strFK).Value = lngToFK
        Else
            rstTo(i).Value = rstFrom(i).Value
        End If
      Next i
      rstTo.Update
      If intLevel < m_Levels Then
        rstTo.Bookmark = rstTo.LastModified
        Call DataCopy(intLevel + 1, rstFrom(0), rstTo(0))
      End If

      rstFrom.MoveNext
  Loop

The code to call the above DataCopy sub looks like:

 clsRelCopy.AddTable "tblTourRoute", "tblBusTour_id"
 clsRelCopy.AddTable "tblTourRouteSpecials", "tblTourRoute_ID"
 clsRelCopy.DataCopy 1, clsTour.m_rstBusTour!ID, clsbookinfo.m_lngBusTourID

The paramters are table name, and FK field. So, the above adds the top tble “tblTourroute”, and the child table of records is tbltourRouteSpecials”. I could easily add a few more tables.  Note how I don’t have to build any sql or anything. I don’t define the sql, or nor even the PK values. This is really simple code.

This code is for a JET based mdb file, but the above code will also work if I I was using sql server. The only provision missing for sql server would be to skip a timestamp field.

Albert D. Kallal
Edmonton, Alberta Canada
kallal@msn.com
Albert D. Kallal Send private email
Tuesday, February 27, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz