The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Structuring Sequential Code

Hello-

I have a method that does validations on a piece of XML received via HTTP POST. A few of the operations this method does is make sure data was received, makes sure the data is valid XML, makes sure it conforms to an XML schema, checks the account Id to make sure it is active, checks to see if the account Id is over its limit, and a few other things. The last step is it throws the XML packet into a queue. The difficulty is that the method responsible for coordinating this validation is very long, despite the fact that the specific checks are done by subroutines.

How would you structure a method like this for maximum maintainability?

Thanks in advance.

-Eric
Eric Marthinsen Send private email
Thursday, January 17, 2008
 
 
Long functions that have very simple flow logic, e.g. sequential are OK, assuming each section is properly labeled. Do consider breaking it up into higher level functions, e.g. ValidateXML(); ValidateSchema(); ... not for the purpose of reuse but for the purpose of readability. An interesting question then is how to pass the state between these functions. It could be done as a state object passed as parameter - which is not so OO - or make them private methods on an object - which smacks a bit of global data. Well, you can't win them all :-)
Dan Shappir Send private email
Thursday, January 17, 2008
 
 
Is it possible to make the interface asynchronous, instead of sequential?
Troels Knak-Nielsen Send private email
Thursday, January 17, 2008
 
 
Option 1: split it into several cosmetic subroutines with telltale names. Variable usage will suggest the logical points to put the boundaries. Works best for processes that are done in several loosely coupled steps.

Option 2: just intersperse it with comments clearly marking what each part does and where each section begins. Works best for monolithic lengthy code that is too involved to split easily.
ping?
Thursday, January 17, 2008
 
 
As others have said, some kinds of long functions/subroutines are simply unavoidable: my favorite example are routines that deal with formatted I/O. If you don't do LOTS of formatted I/O (say that you read a big blob of formatted data at the start of the program and write a big blob at the end, but that's it) then you tend to wind up with a couple of gigantic routines whose only reason for being is to embody the formatting of the input and output files.

The only way it makes sense to spend much effort on factoring the I/O routines is if you do lots of formatted I/O with different formats. In this circumstance it may be worthwhile to turn the giant block of code into a data-driven engine and encode the formatting (or, in your case, the different validation operations) in a table of some kind. I wouldn't put any effort into something like that, however, until I had to write at least two or three giant sequential functions, since the data driven engine is going to require non-trivial design and testing effort (but it will, if done well, make your code MUCH easier to read and maintain).
Jeffrey Dutky Send private email
Thursday, January 17, 2008
 
 
I agree with Troels that it might be nice, if possible, to structure it so that the various routines can be parallelizable.  Putting each one in its own function is obviously the start.

Calling those functions sequentially is trivial, obviously. But if you're looking for more powerful/flexible/generic to lay over that call sequence, I had a couple ideas.

You could leverage the JUnit framework, since what you're doing are, in essence, tests. For some ideas on how to make the tests ordered or partially ordered, see e.g. "JUnit Recipes" by J.B.Rainsberger. Of course, you might not want to deploy the JUnit framework...

You could integrate the Ant framework and make each of the validation routines a target; then your sequence is a simple ant script. One nice thing about this is that stopping on the first failure is implicit.

Another idea: If you don't like the idea of the top-level routine controlling the sequence, you could use something like code threading (http://en.wikipedia.org/wiki/Threaded_code) or a state machine.  Of course, any of these would really beg for OO, which maybe you're not using yet...
John Douglas Porter Send private email
Thursday, January 17, 2008
 
 
I find that sometimes, what readability you gain by dividing up long functions is more than cancelled out by the confusion of passing state and flipping back and forth between functions to figure out what's going on.

I am not saying that gigantic functions are OK, but I do think that splitting things up into a million little functions has its own cost, especially when the nature of the function is that it is very unlikely to ever get reused.
Greg Send private email
Thursday, January 17, 2008
 
 
You can refactor a lengthy "check" routine into a standalone class. Then break the long routine into mulitple methods. Move all the data that is only used by the routine to this new class. If necessary, make the new class a friend of the original one so that it can retrieve the data it wants.
Glitch
Thursday, January 17, 2008
 
 
Passing state?

Stick the data into an object, and give it a bunch of validation methods, none of which should be modifying the data, or particularly dependant on any of the other validation steps. Then everything that handles this object can call the appropriate methods, rather than hoping that every entry-point into the system has a copy of the validation code. Whether validate() is a big long method or a short one that calls 50 other validateXXX() methods is up to you, but modern IDEs make it easy to deal with lots of little functions, so it seems easy enough to separate out each step into its own method.

If nothing else, keeping your state-changing code separated from your validation code was a good idea when C was new, and its still a good idea today.

And the agile people should demand that, too - so that the XML validation routines can be tested independantly of the database validation routines.

You can even do neat things with type systems (static or dynamic) so that passing an unvalidated object to a method that expected a validated object will cause an error. (Compile time or run-time errors are possible.) Unit tests are cool and all, but whether its at compile time or run time, it can be nice to have your language double-check your work for you.


Lots of small validation subroutines - It's functional, its agile, and it's a trivial amount of extra typing for all those function names.

And let's be sane for a second - you may not know what you'll have for breakfast tomorrow, but you do know that sooner or later someone will change that XML in some minor way that needs a change to one of your validation steps.

Thursday, January 17, 2008
 
 
Sometimes when you write workflow / IO code, there's no way to escape long methods.

The thing about OO programming is that it can be elegant, and you can delegate down and delegate down but at the end of the day, something has to do the work, and by the sounds of it, you've reached that point in your code.

Just make the method very long, and comment it heavily.  I would agree with what Greg said

"[...]what readability you gain by dividing up long functions is more than cancelled out by the confusion of passing state and flipping back and forth between functions to figure out what's going on."

If you have to change the workflow of the code (which is more than likely) moving one or two lines up your 1000 line method is going to be easier than tracing through your cosmetic functions and state objects to find out what it is you have to change. 


-jklp
jklp
Thursday, January 17, 2008
 
 
It is possible to break a long routine like that into small classes with much improved readability. I have done that just recently.

For example, assuming that a program reads a huge file from HTTP, validates it, reformats it, compresses it then writes it over HTTP. To allow best performance, the code has to process the data in "buffer, process and transmit" manner. After reading 1K from Http, the program passes the buffer to the validator. After a portion of file is successfully validated, the program passes the result to the reformatter... and so on until it completes the last step. In this way, if there is a problem with input, it  can be discovered at the early stage.

Using traditional method you will end up a giant while -loop with dozens of state variables.

Our implement is like this: for each step, we implement a "action" class. This class has a small buffer that stores the data just needed. For example, a ZIP compressor class has 64 bytes in buffer because a ZIP runs on a 64-byte block. The classes can be linked together following a predefined protocol.

The final code looks like this:

HttpReader reader(...);
Validator validator(...);
Reformatter reformatter(...);
Compressor compressor(...);
HttpWriter writer(...);

... contains the initializer variables that initializes objects.

After every  class is instantiated, hook them into a chain:
reader.link(
  validator.link(
    reformatter.link(
      compressor.link(writer))));

Then start to pump the data:
reader.Pump();

Each class reads the data from the upstream, processes it and pushes down to the downstream. If a class requires further data from upstream, it stores unprocessed bytes in its buffer, but pushes the result down. If there is an error processing the data, the error is passed all the way back to the top class and the Pump() exits with an error code.

Because each class is small with fewer state variables, testing becomes much easier. Each class can be tested individually with mocked upstream/downstream objects. Not mention that the readability and maintainability are greatly improved.
Glitch
Thursday, January 17, 2008
 
 
Thanks for everyone's replys. I've been putting some research time into this as well and have a plan that I think would be workable (it's similar to what Glitch said above). I'm thinking about using the chain of command pattern to link a line of validators together. My base validator could look something link this:

public abstract class ValidatorBase
{
    private ValidatorBase _next;

    public ValidatorBase Append(ValidatorBase next)
    {
        _next = next;
        return this; //allows chaining
    }

    public abstract void Validate(ValidationContext context);
}

with a concrete implementation that looks like this:

public class AccountIdValidator : ValidatorBase
{
    public override void Validate(ValidationContext context)
    {
        int accountId = RetreiveAccountId(context.XmlRequest);

        if(AccountIsValid(accountId))
        {
            _next.Validate(context);
        }
        else
        {
            context.ErrorMessage = "Invalid account ID";
        }
    }
}

Then, I can set up my validator chain by saying:

ValidatorBase validationChain = (new XmlScheamValidator()).Append(new AccountIdValidator()).Append(new AccountLimitValidator());

I think this will make the code readable and, since each validator is its own class, it will be very testable.

I'm not crazy about the ValidationContext class. I might be able to pass "this" into a validator so the calling class can validate itself. The other thing that I'm not comfortable with is that some of the validators will change data. For instance, the one that checks to see if the string is valid XML will go ahead and convert the string to XML and store it in the context. Perhaps renaming ValidatorBase to PipelineComponentBase would be better.

Any thoughts?

Regardss-
    Eric
Eric Marthinsen Send private email
Thursday, January 17, 2008
 
 
Whoops, that constructor should have been:

public abstract class ValidatorBase
{
    private ValidatorBase _next;

    public ValidatorBase Append(ValidatorBase next)
    {
        _next = next;
        return next; //allows chaining
    }

    public abstract void Validate(ValidationContext context);
}

The "return this" was replaced by "return next"
Eric Marthinsen Send private email
Friday, January 18, 2008
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz