A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.
I'm in the process of refactoring what is basically a messaging system. I receive about 100 different formats of medical messages (reports, transcriptions, lab results, etc.) and I need to process them.
After receiving the messages, I immediately store them to a database to await processing. (I realize there are wonderful technologies out there for queueing messages, like JMS and Spread, but I have my reasons, which follow this parenthetical note.) First, we want to keep all messages around for q/a, auditing, redundancy and reprocessing. The first three of these don't pose much of a problem (other than forcing us to store LOTS of data), but the fourth is giving me a headache. The problem is that since we'll need to reprocess old messages, we need to version the message processors.
This suggests adding a version number to each message as it arrives. Easy. My problem is that the obvious solutions for the processors are:
a) Create one class for each version of each format. I don't really care how many classes I've got lying around, but I do mind that most classes for the same format are going to be very similar, and I hate code duplication.
b) Create one class for each format. This means having lots of case/switch/if logic, which also seems icky.
Maybe major version -> new class, minor version -> if statement?
Of course I forgot to mention that the formats change over time. Without that statement, the problem doesn't even arise.
There are small changes that happen weekly or so (not weekly to each format, but let's see a few times per year to each format). There are also four times a year when changes of any scope can occur.
I'm not sure that the schedule of version changes affects the architectural decisions, but I thought I'd provide what I can.
Have you thought about abstracting the message formats away from the code. Create (just an example) an XML format that describes each message. These descriptions are loaded in by the application and used to process each type of incoming message.
You could store all these format descriptions in the database and keep a revision history of them. Each incoming message is then tagged with the exact format-description that it was originally processed with. When the message need to be reprocessed, the original message format is reprocessed.
Without knowing more about your particular situation it's hard to say whether something like this would be appropriate. Clearly what you need it to store, with each message, the exact version of whatever processor was used to originally process the message. You could also just version each processor class, store the processor class name in the database, and use inheritance to facilitate code reuse.
>> I don't really care how many classes I've got lying around, but I do mind that most classes for the same format are going to be very similar, and I hate code duplication.
> ... use inheritance to facilitate code reuse.
You don't need to maintain the old version of the message processor class: you only need to archive it. Therefore why not simply archive it, and to make a new version just do a copy-and-paste-and-then-edit of the processor?
I got the impression from this:
"Of course I forgot to mention that the formats change over time. Without that statement, the problem doesn't even arise."
That reposting an old message would require the software to reuse an old processor. If the messages are stored after being processed, then it's not an issue. But then the OP really doesn't have as much of a problem.
> That reposting an old message would require the software to reuse an old processor.
Agreed: that's why he needs to archive the old processor: in case he needs to reprocess a message with an old format.
But he only needs to *archive* the old processor: he doesn't need to *maintain* it. For example:
Version A - first version
Version B - second version
I'm suggesting that he archive version A (i.e. retain for future use, in a frozen never-to-be-edited-again state, indefinitely); and that version B be copied-and-pasted-and-edited from version A.
If in the future he needed to change the functionality of version B *and* version A, then yes perhaps he could or should use inheritance (or similar mechanism) to encode "A and B" as "A and the differences between A and B" ... but because A is frozen in time, I don't see the benefit of encoding differences to avoid copy-and-paste.
I think that it's unlikely that a single approach would work here. I obviously don't know what format your messages come in, but it's likely that there are varying degrees of similarity between the formats. What I've done in similar situations is create a simple abstract base class with things like Name, ProcessMessage(string) etc. Attach a name to each message and then have a factory class that has instances of all of the message formats registered by name/version. Then you can use whatever classes make sense - divide functionality off into a new or sub-classed function whenever the format diverges enough to be worth it.
I would also look at creating at least one generic class that can process lots of messages. In the past I've used an XML specification that included regular expression formats. But the nice thing about the factory approach is that if the generic class won't fit then you can just use a specific class.
This approach would mean that somewhere in your code you would have a block of code like this.
//Formats.AddFormat(string Name, int Version, IMessageProcessor Processor);
Formats.AddFormat("Blogs & Co Report",1,new GenericProcessor("BlogsV1.xml"));
Formats.AddFormat("Blogs & Co Report",2,new BlogsProcessor());
Formats.AddFormat("Blogs & Co Report",3,new BlogsProcessor2());
I hope this makes sense - I'm just off to get a couple of hours sleep after coding through the night.
Saturday, April 23, 2005
This topic is archived. No further replies will be accepted.Other recent topics
Powered by FogBugz