The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Architecture question

Shamefully, I'm no architect.  For most of my career I've built plain vanilla web apps.  Right now I have a key role in a larger enterprise-level application.  I have a problem that I've never tackled in any fashion before.  I wanted to bounce my ideas off some JoS brains to reassure myself that I'm not going to be kicking myself 3 months down the road.

The current problem I'm dealing with is that of data synchronization.

I essentially have a number-crunching app that runs on multiple (10+) servers.  Each instance of the app (FORTRAN wrapped in C#, btw) is crunching numbers for a different problem-set.  At the end of a run, it slurps up the new input and starts crunching with the new input.

We need to be able to manage each of these instances from a web interface.  At the most basic level, we need to be able to display a list of all the instances along with statistics from the last batch run.  To this end, I have a monitoring service that sits on each server.  To put it in DNS terms, the service is "authoritative" for the instances which physically reside on that server.

Each of the services need to keep their data in sync.  So the results from server #1 need to get propigated to servers 2 to 10.  Each batch run takes 2 to 3 minutes.

One thought is to have a central hub for distributing results.  The other is to have each of the monitoring services communicate directly with eachother.

Keep in mind, that this system needs to be fault tolerant to the degree that I can pull the plug on any one server, and the numbers keep crunching without too much of a burb.  This is another challenge, but it's probably relevant to this data propigation.

Q: What else should I be thinking about?  What might work well in this situation?

P.S. It's times like this that I wish I had hardcore programmers across the hall from me.  Instead I work at home--flying solo with no uber-coders to watch my back.
David Larsen Send private email
Wednesday, June 07, 2006
You might want to look into some of the design considerations and discussion behind the Seti@Home project.  Their new client (BOINC) does pretty much this for 100k's of clients.  I personally have it running on a number of machines.
KC Send private email
Wednesday, June 07, 2006
Sounds like you need some kind of peer-to-peer infrastructure.
Wednesday, June 07, 2006
Peer to peer sounds like more trouble than it's worth.

If I were you, I'd have a single master server in control of multiple slaves. When each slave finishes its tasks, it notifies the master and requests a new task. Also, the master should be able to periodically ping the slaves to make sure they haven't died.

I'd also have a backup master (using replication on its database). In case the primary master dies, all of the slaves failover to the backup master. (Depending on your resources, you might want to make it possible for the backup master to function as a slave until the primary master actually fails.)

If you implement a peer-to-peer scheme, then every node will have to function as master/slave with n-way replication. Methinks it would be ugly.
BenjiSmith Send private email
Wednesday, June 07, 2006
David, your architecture sounds like a good one.  You may want to make your 'central hub' redundant, if that crashes the whole thing would get out of sync. 

You could have either have a hot-standby machine that does nothing but sit there ready for a failover, or you could have both machines activly working together to farm out the different 'inputs' to your nodes.
Vince Send private email
Wednesday, June 07, 2006
Thanks for the feedback.

Now this master node can either push out updates to all the nodes as they come in, or each slave node can periodically ping the master for updates.

Off the top of my head, I think the latter option would be preferrable.
David Larsen Send private email
Wednesday, June 07, 2006
"Now this master node can either push out updates to all the nodes as they come in, or each slave node can periodically ping the master for updates."

The problem with the latter is if you have multiple master servers, the nodes will have to be smart enough to handle failover if suddenly the master server stopped responding to its requests for data. 

You could have the nodes broadcast an "i'm done" message to everyone (or the master serves), and then have the servers push the data to the clients. 

This is probably overkill if you don't need that level of uptime or the consequences of a few nodes working on redundant data isn't catastrophic.  Either way, looks like your on the right track.
Vince Send private email
Wednesday, June 07, 2006
One of the problems that the SETI people ran into is that the original client would repeatedly ask for data whenever it needed data.  The problem was that if the service was down for a short period, potentially thousands of clients would all try to connect at the same time afterwards.

They solved much of it with a random backoff algorithm coupled with getting multiple work units at a time... so the responses were more staggered and less prone to service-killing spikes.
KC Send private email
Wednesday, June 07, 2006
KC, where can I read about the SETI project considerations?
David Larsen Send private email
Wednesday, June 07, 2006
What happens if a slave goes offline?

Since these batch jobs take three minutes each, I think it's feasible for the master to push the data out to a slave, and if it doesn't get a response within five minutes, assume the slave died, and send that same job to a different slave.

Since David wants to be able to monitor the slaves, I think it's worth doing some back of the envelope calculations for David's particular circumstance and see if it's reasonable to have separate connections to each slave, and to keep those connections always alive.
Wednesday, June 07, 2006
I have always found tuplespaces to be an elegant way to coordinate and synchronize access to data. Check out Gelernter's old papers. Ken Arnold and the Jini folks also have JavaSpaces which take the tuplespace model and add Jini leases, a decent security model, and the posibility of mobile code. Even if java isn't an option, the ideas are good.

Does this make me an astronaut?
Wednesday, June 07, 2006
I built a somewhat similar system a long time ago and found that using queues (ala MSMQ or MQSeries) was great (especially MSMQ).

There were three queues: Ready to go, in progress, and failed.  Each queue item had the data to run (or a pointer to the data), an owner ID, the current time, and a failure count.  When a worker machine grabbed something from the "ready to go queue", it actually just added it's ownerID to the item and moved it to the "in progress" queue (could have been done within a distributed transaction to make it failsafe).  When it finished, it removed it from the "in progress" queue.

I had a service that fed requests into the "ready to go" queue.  I had a service that monitored all queues--if the "ready to go queue" got too large, we were falling behind and it notified us.  If an item was in the 'in progress' queue for too long, we assumed a worker had died.  It could notify you, but we just increased the 'failed count' and moved the item back to the 'ready to go' queue.  If the failed count was too high, we assumed the data was crashing the worker and moved it to the "failed" queue.

Finally, we actually had multiple workers per server, so we had a worker process service that watched the workers and launched more of them as needed (if we started falling behind, if one of the workers crashed, etc).

It was the sweetest system I ever worked on, was bullet proof, and was NEVER the bottle neck in our system (in fact, it could speed up to the point that we started overwhelming our big iron Sybase database)
PA Send private email
Wednesday, June 07, 2006
If you feel like diving into a book, you may enjoy "Distributed Operating System and Algorithms". It contains a lot of interesting algorithms for handling the types of scenarios you'll face. For example, if you decide to go with a hub solution, then you'll will probably want to use the "bully algorithm" to elect a new leader if the main server goes down. As a more general read you might find "Enterprise Integration Patterns" to be helpful after you have solved the major issues.

I'd lean against peer-to-peer and instead go for a master-slave approach based on elections. Simply having a fall-back server can save the day, but it doesn't provide distributed fault tolerance and maintains a central point of failure. Your requirements will dictate which approach makes more sense, based on time/resources. It largely a matter of how scalable this will need to become.

I also agree with Vince about updates - follow his advice. Otherwise I think you're on the right track and just need to dig into what's out there (both in theory and implementation). While you're reading up, get an architecture down on paper that you currently like. Then get together with a bunch of colleages and rip it apart. Rebuild it from all that you have learned.
Thursday, June 08, 2006
+1 for a queue-based (such as IBM's MQSeries) loosely coupled, asynchronous design. Each server reads a message off a queue and puts a reply with results back when its done. Using a cluster queue, you can distribute the load in a round-robin or other algorithm. Another major advantage is that you would'nt have to worry about writing reliable low-level networking code.
Thursday, June 08, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz