The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Online/Offline Database Synchronization

I'm looking for articles etc. on designing and implementing database synchronization. A database is kept on a local PC and maybe synced with a central master database or possibly other instances of the database on other PC's. A database can potentially be updated by multiple users who may be connected together or not. If they aren't connected the databases need to be synced the next time they are connected.
Neville Franks Send private email
Friday, March 14, 2008
 
 
I can tell you from experience that this is quite a challenge. Microsoft is actually developing a framework for this call Microsoft Sync Framework (creative title huh?). You might start there to get a general feel for what they think the major challenges are. I'm not advocating using it. Just advocating reading up on some of their background documentation.

http://msdn2.microsoft.com/en-us/sync/default.aspx

Good luck.
dood mcdoogle
Friday, March 14, 2008
 
 
I have left on a previous discussions the problem
with doing synchronization, and on another one a link
to the book on the topic by Prof Ken Birman (Cornell)
- I think I left a link to a pdf, but I am too lazy
to look for these now.

You may find that how well you can get synchronization
to work is dependent on the semantics of the data
- distributed version control, with multiple branches
at different nodes, can be made to work, while
general purpose distributed transactions (ACID) can't.

This is because you can't in general be assured that
the network between relevant nodes has broadcast
semantics. And packets get dropped. So one node sends
a packet, but doesn't know if the other node has
received it. If the second node sends an "ACKnowledged"
message, it won't know if the first node has received
the ACK. From here on, the problem is recursive (or
should that be inductive?).

So, it is actually impossible to perfectly sync two
nodes, one will always be behind. People assume that
this isn't the case because distributed transaction
systems normally run without problems, but in the
face of errors such as network errors, they may go
funny by going inconsistent with each other or by
grinding to a halt.

The trick is probably to arrange your sync protocol
so that node that falls behind is only ever-so-slightly
behind, which as I mentioned may depend on the
semantics of your application (The other way is to
build a system where all the nodes *can* simultaneously
grind to a halt but with only a tiiiiiiiny probability).
My idea is usually to keep things simple and centralize
everything on a server without multiple copies of the
data to sync.

Failing that, your best bet is to build the other
nodes to basically operate autonomously, and use a
central server as a coordinator, but one which
coordinates in a very hands-off style (so it doesn't
matter if it goes down for a long time).

(eg each node has clearance to service X requests
before it has to successfully inform the server of
the results of all these requests, whereupon the
server grants it more budget. X varies depending on
what the server feels like but is supposed to be big
eg 1 billion.)

If you want to have multiple candidate coordinators
you have to come up with a protocol for failover, and
how to deal with when the failover mechanism fails,
and network partitions, and other nasty stuff, and
I think you're a masochist.

Google for "log shipping" and "group membership
protocol" to get you started.
Object Hater
Friday, March 14, 2008
 
 
@Object hater:

I don't understand the infinite-ack-problem; Assume the following protocol:

1. A sends "command" to B
2. B executes command
3. A sends "query for status" to B
4. B sends "status" to A

If the transmission is lost in 1, then the status (the payload of 4) will contain an error.
If the transmission is lost in 3, then A will never recieve the status. A can wait for time T, after which it will repeat from 3. B must be able to handle multiple queries (step 3-4).

Once the entire chain is completed, A will know that the command has been executed properly. B, on the other hand, will know it at step 2.

Of course, for the sake of efficiency, the first step 3 could be implicit. Only if something goes wrong, would A have to repeat it.

Am I missing something obvious here?
Troels Knak-Nielsen Send private email
Saturday, March 15, 2008
 
 
"Am I missing something obvious here? "

I think what you are missing is the case where BOTH sides need to know that the other side got the request/response. In your example, if step #3 never happens, side B doesn't know if side A actually got the response and acted on it. If side B has further processing that is dependent upon knowing that side A received the initial response then side B is now stuck.

This is the basis for the infinite ACK problem.
dood mcdoogle
Saturday, March 15, 2008
 
 
I still don't get it. If B, for some reason, needs to know if A got the response, then _it_ can send a query and wait for a reply (and in case no reply arrives within a certain timeframe, send a new query).
Troels Knak-Nielsen Send private email
Saturday, March 15, 2008
 
 
@Troels, the OP did not specify what the application
was. We don't know how the behaviour of one client can
affect the behaviour of another. If they can be arranged
not to, you don't have the problem. That's why I said
it depends on the application semantics.

Example. A CRM system could host a sales catalog with
snapshots on the salesmen's laptops A & B. Let's say if
A makes a sale and puts the product out-of-stock, then
B ought not be able to make a sale. So we have
Step 1: A, B and server start in sync.
Step 2: A sells the item, sync's with server.
Step 3: Server crashes. A dies forever
 (salesman makes quota, goes on holiday).
Step 4: B wants to sell the item.

Step 4 is where the problem comes in. If B is built to
proceed with the sale, we have inconsistency. If B wants
to sync first, and keeps re-trying as you suggest, or
otherwise gives up, then the whole system has been
blocked until the server wakes up again. And remember,
we could have a large number of clients like B.

Data replication appears to be an easy problem, but in
fact becomes "rocket science" if you have a consistency
requirement as well as an unreliable network. You can
google for "Fisher Lynch Patterson" for the general case
proof.

When synchronization is asked about on these boards
I advise against even trying because I fear the poster
may attempt to build a consistent-data based protocol,
which might work on his/her test network, which may be
something very reliable like a local hub. Then after
all that work, the program gets deployed on the internet
and then might go horribly wrong.
Object Hater
Sunday, March 16, 2008
 
 
Incidentally, Google Gears has this exact problem and they decided to leave it to the developer for the time being.
outback
Sunday, March 16, 2008
 
 
Thanks all for the replies so far. The "Master-Master Replication" article along with related articles at http://msdn2.microsoft.com/en-us/library/ms978735.aspx describe the issue I am interested in quite well.

This is not a business app and conflict resolution can most likely be reasonably simple, For example the item with the newest timestamp wins. It would appear that if I were to use SQLServer or MySQL then this might all be relatively straightforward, but I'm not. Lots more reading and thing to do.
Neville Franks Send private email
Sunday, March 16, 2008
 
 
The only thing I can say is use GUIDs for your database keys.  Once you go GUID, you'll never go back to integer id's. :)
Michael Johnson Send private email
Tuesday, March 18, 2008
 
 
The problem is when one document can be edited by many users simultaneously. Then, you have to do some kind of merging during synchronization . If you can avoid this case, the rest shouldn't be difficult.
outback
Tuesday, March 18, 2008
 
 
Adobe Air has the same issue that it will be up to the developer to implement.  Object Hater is on the right track. As he mentioned just google "shipping/transaction logs" and "group membership protocol."

This is going to be a pretty common thing with the new online/offline RIA platforms such as Google Gears, Adobe Air(local SQLite db library) and JavaFX(local JavaDB(Apache Derby)).
Ballmer's Chairkeeper
Tuesday, March 18, 2008
 
 
>As he mentioned just google "shipping/transaction logs" and "group membership protocol."

Thanks, but I couldn't find anything of interest here.

>This is going to be a pretty common thing with the new online/offline RIA platforms such as Google Gears, Adobe Air..

Agreed, but they offer very little, other than some basic guidelines.

The Dojo Offline Toolkit extends Google Gears but it is also a fairly simplistic implementation. See: http://dojotoolkit.org/offline
Neville Franks Send private email
Wednesday, March 19, 2008
 
 
> Agreed, but they offer very little, other than some
> basic guidelines.

Because it is a tough problem.  :)

How do wikis handle this? Even though there is a single copy of the documents and they reside on a central server, I suppose it is still possible that more than one person at a time could potentially try to edit the same document simultaneously.

Perhaps you just have to provide a check in/out and merging mechanism for this kind of scenario no matter what.
coder
Thursday, March 20, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz