The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Learning parallel programming

Any tips on resources for someone that wants to learn parallel programming? seems like it might be a decent place to start. I suspect that a 1995 text is still mostly valid, but something more modern might be more practical.

I know that some tasks are inherently serial and others are trivially parallel, but I don't know how to spot those that fall in between. Actually, now that I think about it, I'm not sure I could reliably spot the trivial cases, let alone provide proof.

Note that I haven't had any trouble finding tools (Erlang, Retlang, and even plain old threading support). What I've had less luck with is how to decide what can be split up, the various ways in which the work and data can be distributed, and how to estimate the relative performance of serial versus parallel and different workload distribution schemes.
Ron Porter Send private email
Monday, November 19, 2007
This _is_ a difficult topic.

There are two broad classes of parallelism: data-parallel
and control-parallel.  (there are other ways to approach the
problem, but I like this one).

In data-parallel, you do essentially the same operation on a set of data, and the calculation of each datum is essentially independent.  This is the realm of google map-reduce (although map-reduce isn't the right solution for all data-parallel problems).

In control-parallel, you have a bunch of operations to perform (possibly on the same data) that are essentially independent.

A separate consideration is granularity: are the parallel data/operations small or large?  This influences your latency tolerance.  Small data/ops require low-latency communications.  Large latency is the realm of the Beowulf cluster (a running joke on slashdot).  Most web-apps are large latency.  Low-latency data-parallel is hard: this is the realm of scientific supercomputing.  They know a lot about this at Argonne/MCS so you started looking in a good place.

Why don't you describe your problem in a little detail?
dbs Send private email
Monday, November 19, 2007
This is a fairly complex topic. Quite a few factors are involved.
Quote "Parallel computing exists in several different forms: bit-level parallelism, instruction level parallelism, data parallelism, and task parallelism."

Some interesting reads (and of course a lot more) see

or just search for "parallel programming" at and you will see lots and lots of article written.
JasonL Send private email
Monday, November 19, 2007
Thanks for the responses so far.

I've seen the wikipedia stuff and some (not much, so far) of the msdn stuff.

So, you want a better description of my problem... Well, I'm mostly looking for the same kind of cheap thrills that  I got way back when programming was just a way to kill time on the week-ends.

I do, however have two work-related projects that might afford the opportunity to do some playing at work :)

The first is a photo resizing application. We take up to 200 photos a day for archival and service purposes. The vast majority don't need to be at full size, so we have a little service running that watches for new photos in a specific location and automatically resizes them and adjusts the jpeg quality settings. It also builds an html thumbnails page. Performance expectations are being easily met using plain vanilla single-threaded code so there is no particular need to distribute the work across multiple processors/workstations. But it's clearly parallelizable (is that a word?) and it's there.

The second is a scheduling application. Performance is considered pretty bad, although I've managed to get a full recalculation down to 10 minutes (from 4 hours), mostly by replacing all the stupid stuff I was doing with some less stupid stuff (hurray for profilers!). I have a strong suspicion that the problem is serial by nature, but there are some obvious opportunities for parallel processing.

The basics of scheduling: We have a number employees, each assigned to one or more workcentres. We know their days and hours of work and we know how many man-hours are required at each workcentre. We have anywhere between 120 and 250 units on the schedule at any given time. Within one week, when will each unit be complete? No worker can be in two workcentres at the same time (although they can be assigned to more than one workcentre). No workcentre can contain more than one unit at any given time. Each unit takes a pre-defined path through the workcentres, but not all units use all workcentres and each unit spends a different amount of time at each workcentre.

So far, the only way I've seen to handle this is to step through each unit's workcentres one by one, in order. Check who is assigned and their availability for the date. Subtract the 'booked' hours from both the employee and the workcentre's required man-hours. Repeat for the next date until the workcentre has no hours left, then move to the next workcentre.

There are some workcentres that can work in parallel, and that would take it from 90 steps down to 70 steps per unit. That might take it down to 8 minutes. In the section where I have to check for available time, it's easy to see more opportunities when getting employee availability, but I haven't actually quantified them yet. All in all, the stuff I see and can already figure out how to put on other processors might cut the run time in half assuming I don't screw it up too badly.

The easy thing to do is just get Eric Sink's multicore map ( or Retlang ( That might even be the right thing to do. But I'd still like to actually learn more.
Ron Porter Send private email
Monday, November 19, 2007
Your scheduling problem sounds like it could be solved using the Specifications pattern (see "Domain Driven Design").  You're already fairly close, but that might help you clean-up the code.

I would focus first on task parallelism, as its the easiest to get right.  Its fairly easy to write a master/slave implementation, and then quickly demonstrate a speedup by throughing an obvious problem at it. Of course, the biggest problem is that not all tasks fit easily in thos model. (You can also extend this to write a map-reduce implementation, which is really just a two-step master-slave design).

You may also find fork-join easy and fun to write. A good implementation can be hard in non-functional languages (e.g. Lea's Java-7 framework). There are few problems that fit into this model, but its one of the very first taught.

You could use the first method on your photo example and, perhaps, find a way to use the second for your scheduling example.

You might also like Michael's blog (posts here, at times) and which has a list of books he's enjoyed. (
Benjamin Manes Send private email
Tuesday, November 20, 2007
The image problem can be approached using a master/slave structure, where you have a single point of entry for incoming tasks (images) and a bunch of back-end servers that execute tasks given by the master.  The slave program can be persistent, and wait around for the master to give it work, or it can be started on-demand by the master. Or a combination of both -- a pool of slaves that expands as necessary.  Depends on how
much work there is to do, how many back-end slave processors you have available, and whether a slave uses all of the cpu or is I/O bound.
(This assumes that the rate of incoming tasks is low enough that a single master/manager can handle it.)
The master and slaves require different programs.  Connecting the programs is relatively simple because there isn't a lot of traffic: master sends a new task, slaves sends an answer.  You could do it in perl with a pipe running SSH and select().  You could be a little fancier and do it with sockets.  Or with http requests. or some kind of messaging system.

The scheduling problem sounds like a classic Operations Research problem.  The optimization web at Argonne is a
good resource ( and
may be particularly appropriate if I understand your problem correctly (I'm not an OR guy, so I could be wrong)
dbs Send private email
Tuesday, November 20, 2007
I would caution on trying to use parallel programing on you photo task. Reasons:
- 200 photos a day will not give you any significant load so you can actually exercice most tricky parts of code.
- Such applications are usually I/O bound (resizing of a picture without large set of filters will not load CPU enough). Trying to read files in parallel likely will kill perfromance faster then you will be able to gain by running compression in parallel.

It would be fun project to do, but don't expect it to give you needed painful learning expirience due to extremly low load in real environment.
WildTiger Send private email
Wednesday, November 21, 2007
Here's the book we used in a graduate course I took in parallel computing:

David E. Culler, Jaswinder Pal Singh, with Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach, © 1999 Morgan-Kauffman, ISBN 1-55860-343-3.

Here's the course website:

The course notes are available for download. This course was a distance education course and it looks like the lecture videos are available to view without logging in. This may not last past the end of the current semester though.


Wednesday, November 21, 2007
Wild Tiger: I wouldn't dream of doing our photo stuff in parallel except as a toy project to test a few concepts. There are no performance issues at present. I simply offered it as one of the things I know has some tasks that can, in principle, be performed in parallel. Too much I/O for my understanding of the advantages of parallel processing.

Thanks for the other responses, especially the RIOT link provided by dbs. Where were you when I was first researching scheduling issues? Our production scheduler is almost in shock :) It has little or nothing to do with parallel programming per se, but it will prove invaluable as we move forward. It's funny the things we learn from what we learn.

I've got enough now to keep me busy for years. I'll check back in occasionally to describe my progress.
Ron Porter Send private email
Thursday, November 22, 2007
For the resizing images thing take a look at this article on how a thumbnail tool was sped up for a multi core / CPU system:

If you're doing a lot of I/O multiple threads can actually speed up things on a single core system by letting you do stuff while waiting for the slow I/O.
Thursday, November 22, 2007
To bring some books in - and parallel programming justifies a book - "Programming with POSIX Threads" ( ) is worth the money in my opinion.
Micha Send private email
Friday, November 23, 2007
"... as toy project..." is exactly the problem I tried to point to: it is easy (from very easy to somewhat easy) to write code to run some tasks in parallel, but hard to predict and find cases when something goes wrong. And  nothing usually goes wrong in toy projects due to small load. Disk IO - if you know how to do it right - then parallel execution is fine, otherwise it will not give you any gains or slow you down (a lot of small random reads/writes will do wonders to IO troughput).

To throw some links too: WikiPedia is traditional source for starting links for any problem. I.e.
WildTiger Send private email
Monday, November 26, 2007
I would stay away from web resources as learning from someone who knows a little bit from alot is dangerous.
If you do, make sure they know what they're talking about, is very good.

There is plenty of academic books about parallel computing, its an old old topic whose principles have been in place since the seventies...

Skim university webpages for resources too ;)
Tony Nguyen
Monday, November 26, 2007
Point taken about the toy project. If memory serves, I've been bitten a few times like that :)

As it happens, most of the resources I'm accessing are academic in nature. I'm glad you reminded me to focus my attention it that area. As with toy projects, I've been bitten more than once by skipping the fundamentals.
Ron Porter Send private email
Tuesday, November 27, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz