The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Estimating need for documentation out from LOC?

Does anyone know about a rule of thumb for calculating how much documentation there should ideally be per Line Of Code?

So assuming no documentation exists and we have a code base of say 1 million LOC: is there a way of "guessing" how many lines of documentation ought to be written for this code base to cover it to a reasonable extent.

The situation is of course theoretical, because even though I now have to estimate this for our code base, there is not much chance it will ever have to be carried out. Much too expensive for the customer.

Still, any ideas anyone?
Stephen Muires Send private email
Wednesday, August 15, 2007
 
 
I'd say it depends both on the complexity of the code and the sanity of the architecture.
More complex code (i.e. algorithms and such) -> more inline documentation
Better architecture -> less complex code units -> less inline documentation, more outside architectural documentation.

To me, having to comment "in" code is often a sign of a poor architecture; there are exceptions of course.
Don't fix what ain't broke.
Wednesday, August 15, 2007
 
 
Sure, it depends on the code and architecture and a number of other factors. But I need to abstract away from that. It's a mix anyway, some code is new and welldesigned, other code is old and murky.

What I want is ballpark guesses, like: it takes 1 line of documentation for 30 lines of code. Is this near or far off the mark? (I know the question is unrealistic and therefore meaningless, but this is a real life situation and it is exactly that: unrealistic and meaningless.)
Stephen Muires Send private email
Wednesday, August 15, 2007
 
 
"I know the question is unrealistic and therefore meaningless, but this is a real life situation and it is exactly that: unrealistic and meaningless."

The answer will be the same: unrealistic and meaningless.

Sincerely,

Gene Wirchenko
Gene Wirchenko Send private email
Wednesday, August 15, 2007
 
 
What kind of documentation?

There are comments in the code of course, and then there's external documentation like requirements, specifications, data dictionaries, design documents, flow charts, schedules, budgets, user interface mockups...

As an extremely rough and crude ballpark, there should be two or three lines of comments for each significant function (or method really) that describes the purpose of the function, any assumptions, expected results, etc. Assuming each function averages about 10 lines of code, I'd expect to see about 200,000-300,000 lines of comments in this hypothetical million line code base.

However, I should also point out that these comments, as I described them, can just as likely be in the requirements or design documents. The format of those is such that you may only need a "comment" for every object or module. Assuming an object averages 10 functions, you may only need 20,000 to 30,000 lines to describe a million line code base.

If you believe that a picture is worth a thousand words...

So my answer would be anywhere from 0.2% to 30% depending on the nature of the documentation itself. And this is a key point...

Some people get more value out of comments in code. Some people get more value out of higher level "big picture" documentation. Some of this will arguably be useless - for example, comments in code cannot tell your quality assurance guy how to compile, install and test your program. If there's absolutely no documentation anywhere, I'd recommend you start first with...

1) A "scope of work" that briefly describes what the program should do and what it doesn't do.

2) Compiling, installation and "smoke test" instructions.

3) High level, abstract requirements.

4) More detailed specifications, including data validation rules.

5) A flowchart or process diagram.

6) A design document, including a data diagram.

7) Coding standards, recommended techniques.

8) Unit tests, if applicable.

And finally 9) retroactive comments in code.

I picked this order on purpose - a smart developer can take #4 (for example) and figure out his way down the list, but going up the list requires involvement from the stakeholders and management.

And incidently, the "scope of work" for a million line code base can be 20 lines or less.
TheDavid
Wednesday, August 15, 2007
 
 
A more useful question might be "what documentation can and should we produce within the available budget[s]?"

E.g. "given $X we could document aspect A and B of the software, and given $Y we could document aspect C as well".
Christopher Wells Send private email
Wednesday, August 15, 2007
 
 
Sure, you can make an estimate. As others have said, without a lot more information (other than just LOC) the estimate will be pretty meaningless. However, given just the LOC measurement I would use the following method to develop an estimate:

For documentation in the code itself (comments), assuming that the current code is completely uncommented, I would say that you will need about as many lines of comments as you have lines of code (most of that in block comments per module and subroutine).

For out-of-band documentation -- user manual, programmer maintenance manual, etc. -- I'd make a quick guess at the number of subroutines in the code base and multiply by a number of lines (or paragraphs/pages/whatever) on the theory that the number of subroutines is propotional to the number of function points or features that need to be documented: e.g. for the 1 MLOC example, assuming that subroutines are, on average, 100 LOC, you'd have something like 10,000 subroutines. I'd guess that you'd need at least 1/4 page per function point/feature, so you're looking at about 2,500 pages of documentation, maybe more.

Again, this is all delphic estimation (pulling numbers out of an educated ass), and both number are likely to be a bit high, but if a client came to me with such a piss-poor requirement ("how much will it take to document 1 MLOC, sight unseen?") it would be irresponsible for me to give him anything other than a gross overestimate.
Jeffrey Dutky Send private email
Wednesday, August 15, 2007
 
 
And, of course, the next question is the intended function of the code and documentation.  In my industry (aviation), the level of documentation required depends on the safety-criticality of the intended function of the software.  (Of course, a truism in this field is that it is often easier to follow the process from the beginning and rewrite the software rather than try to up-level existing code.)

Wednesday, August 15, 2007
 
 
Can go up to 50/50. That's not one comment per line, that's the headers, and external documentation.

So 1000 lines of code, we want to see 500-1000 lines of documentation.
Meghraj Reddy
Wednesday, August 15, 2007
 
 
I think Jeffrey Dutky has the closest estimate, and it's probably an under-estimate.  You'll need, guaranteed, at least 1 comment per 1 line of code if they want you to document the code base for the purpose of them taking the source code and understanding it.  That's not saying that every line will have a comment, that's saying that some lines will have 0 lines of comments and others will have 7 lines for instance.

Even if you're not documenting the code in-line, you'll at least be documenting modules, subroutines, and (hopefully) conditionals, and documentation for things like that can be a page long if written well.

Yes, it sounds like a ridiculous number (1:1), but I assure you that even if the estimate isn't correct, it's far, far better than something like 200,000 lines of comments per million lines of code like someone else mentioned.

You can expect probably a 1:2 or greater ratio if the person doing the documenting has never seen the code before.
moof
Wednesday, August 15, 2007
 
 
To Gene Wirchenko:
yes, I know. Nevertheless that's my question. I assure you this is taken from real life. My job is to find an answer that to all practical intents and purposes is meaningless and useless. You must have experienced the same kind of situation if you're in software development...

To TheDavid:
great, thank you very much for taking the time to answer this. I can use your answer.

To Jeffrey Dutkey:
great, likewise thank you very much for taking the time to answer this. I can use your answer, especially the lines: "if a client came to me with such a piss-poor requirement ("how much will it take to document 1 MLOC, sight unseen?") it would be irresponsible for me to give him anything other than a gross overestimate. " That's pretty close to the mark.

To moof:
wow, such estimates would give our company work for the next 3 years :) Thanks anyway, your response illustrates nicely the impossibility of the task.

Everybody thanks for responding.
Stephen Muires Send private email
Thursday, August 16, 2007
 
 
42

No, just kiddin'

I think only complex algortihms should need inline comments. All other code should be written expressive using appropriate names for instances and parameters. Then code documents itself.

Each class should have a short description of its role and responsibilities in the whole. Methods would preferably not need such a description but pre-conditions and expected exceptions is very useful.

certainly no-one needs:

// gets the length
public int getLength()...

which you see all too often!

Documentation that I find missing in most cases is a somewhat higher level architectural/code design overview. I dont mean class diagrams (everybody's got those). I mean abstract diagrams showing component interfaces (coupling), dependencies and deployment (system/sub-system kind of stuff).
Marc Jacobi Send private email
Thursday, August 16, 2007
 
 
"To Gene Wirchenko:
yes, I know. Nevertheless that's my question. I assure you this is taken from real life. My job is to find an answer that to all practical intents and purposes is meaningless and useless. You must have experienced the same kind of situation if you're in software development..."

Yes, and I refuse to buy into such garbage.  Individual cases have too much variation.  You do not even get into how much detail is wanted!  There is a difference between brief notes and full detail: one will be rather longer than the other.

Sincerely,

Gene Wirchenko
Gene Wirchenko Send private email
Thursday, August 16, 2007
 
 
The question is meaningless.

A gbetter way to find out if you have enough is to:

1) Run your code through a source code documentation tool sucgh as Doxygen (it's free, so there's no capital outlay for this).
2) Fix all the warnings it produces (there will be lots of these)
3) Give the resultant output to new members of your team and ask them if it helps them to understand your code.

If you fail on any of the above, the answer is almost certainly "not enough".

In my experience it is crucial to get your devs into a mindset of documenting code (and writing the tests, for that matter) as they write it. Anything else, and you're fighting the "but we haven't got time!" battle - and you'll *always* lose that one.
Anna-Jayne Metcalfe Send private email
Wednesday, August 22, 2007
 
 
Damn typos. Where's the edit button? :doh:
Anna-Jayne Metcalfe Send private email
Wednesday, August 22, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz