| ||
|
This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts: Albert D. Kallal Li-Fan Chen Stephen Jones |
I just finished a 1 month project working on my own for my employer - It was part of a web app with db tier, front end and backend in java. it was around total of 1000 LOC java and 300 LOC html/javascript. My total bug count after QA testing and all was around 6, with 3 being trivial UI issues. My company wants to do metrics based on bugs/LOC Anybody use these metrics, is it a good idea to do so? Personally I don't think so
Anon Thursday, September 23, 2004
What's the significance of a bug count? Imagine a space shuttle driven completely by very complicated software. Let's say that two teams implemented their own versions of the software. Let's say that Team A's version had 5,000 bugs while Team B's version had only 1. Which product is of higher quality? Oh as a sidenote, Team A's bugs were related to UI ticks and out-of-sync voice messages. Team B's single bug crashed the shuttle and killed everybody.
OK, so here's a blatant cut-and-paste from a fairly recent discussion that was along the same lines... Here is a recent, relevant, and not too long thread on the topic appearing in the comp.lang.c++.moderated newsgroup: http://tinyurl.com/4xs39 This points at Bjarne Stroustrup's poignant entry, but also of note is James Kanze's post, part of which reads: "The productivity of a programmer can't really be measured by any simple measure. If I had someone like Stroustrup, Sutter or Meyers on my team, I wouldn't have them writing very much code; they main responsibility would be coaching the others. Unless the others were already exceptionally good, that would improve the total productivity of the team more than if they were coding isolated in a corner." The thread doesn't mention so much about bug counts, but furthers the cause of debunking the LOC myth.
I keep a count of total bugs, critical bugs, and critical-or-medium-severity bugs. (When reported, each bug gets a number from 1 to 25 according to its severity and priority.) Why? Well, it's fun :) Seriously, reducing critical-or-medium-severity bugs is an issue, and if you have the numbers, well, it helps. But the numbers are just a summary of the database. If you filter all the important bugs, you can actually think about what caused each of them. Now, if you "want" to try doing something similar for your interests, it might be a good idea. If your company wants to do it, it's probably not so good. If you do the numbers for yourself you won't lie, but if the company starts looking at who makes more bugs, very strange things will happen to bugs before they're entered to the database and many will be lost in the way. Oh but Joel explained it better in an article.
99% of programmers aren't Stroustrups, and most bugs can be clearly and immediately categorized as 'shuttle crashers'versus cosmetic issues. In any case you can't just pay people IT sized salaries and hope for the best; you have to measure Something. IMHO bug count is at least arguably reasonable, LOC is a lot dodgier.
I recommend the following 2 books. Fenton's book is the one that most classes on the subject would use. http://www.amazon.com/exec/obidos/tg/detail/-/0534954251/ http://www.amazon.com/exec/obidos/tg/detail/-/0201729156/ LOC is not too useful as a metric. But if it is the ONLY thing you will be measuring, then it will be better than nothing. And usually, folks use metrics to esimate/predict bug counts. Or, predict which modules will be difficult to maintain or test. If only bug counts and LOC are being measured, then most likely, they are being gathered to whip you around. Or at the least, try to determine who is productive and who is not. If your goal is quality improvement, then there are many different things to measure in the code. You want to be able to say things like "Based on the metrics, module A is 95% likely to have 3 or less bugs, while Module B is 95% likely to have 50 or more bugs." Which module will you spend more time reviewing and testing? So, the bottom line question is: What are you trying to do?
Peter Thursday, September 23, 2004
I don't think it is a good idea to do metrics based on a ratio of LOC to bugs. Why? For x = 1 to 1000 debug.print "There are no bugs in this loop." debug.print "This line makes my metrics better." debug.print "This one does too." debug.print "So does this one." debug.print "I deserve a raise." Next x Code review would prevent something as blatant as this (I would hope). However, the concept is still there. You would be rewarded for doing things in the most lines of code, regardless of any logical reason to do so.
I am Jack's muddled multiline metrics Thursday, September 23, 2004
There are problems with metrics of this kind. 1. How do you measure them? You must have some kind of style guide on curly-bracket placement, number of statements per line, number of statements per Function/Method, and commenting to insure code-counts are comparable across projects. 2. You need a 'bug' standard. Humphries PSP (Personal Software Process) has a START at a good 'bug' definition structure -- but it's only a start. 3. What is considered 'good'? If your company is using six-sigma metrics (from manufacturing) you may be held to an impossible standard (applied to software development). 4. What are the metrics going to be used for? People are going to 'game' the numbers, by changing their behavior in the direction the METRICS indicate are 'good'. You want to select metrics to insure when people do this, that the actual product quality is enhanced. Enhancing true product quality (on-time and on-budget) are the goals here, after all. The metrics merely serve as an indicator of quality. It's your customer's experience and trust in the software that leads them to buy more, and enhance your company's bottom line. 5. Are the metrics going to be used for retention/firing? This is really-really tempting, and also really-really BAD. People in this circumstance will pay little attention to true product quality, instead improving their metrics. For instance, if you measure bugs fixed, then introducing lots of bugs early on, so you can have lots to fix later, gives the person better metrics. If you measure bugs detected, and low is better, then the person has little incentive to detect bugs. If you have Person A detect Person B's bugs, then you introduce political back-scratching or back-biting. 6. The best use of metrics is for each individual to use them to measure their own performance, and to detemine when they use different approaches to coding which one is better. The person is almost NEVER the problem, the processes, tools, and procedures almost always are. Note most people (including management) don't think this way very often. Unfortunately, it is very tempting for Management to use metrics in non-helpful ways, and very difficult to prevent them from doing so.
AllanL5 Thursday, September 23, 2004
This thread pops up every few weeks. I'm always surprised and somewhat dismayed over many of the objections raised about using metrics. But before I pick on anything in particular, let me state my personal view: 1. Metrics should be put in place to serve a goal, not just to gather metrics. 2. The metrics should be normalized (e.g. spacing, bracing, function size) via enforced coding standards. 3. The metrics should be calibrated within an organization, and not used across organizations. 4. Metrics should (in most cases) be anonymous. 5. Metrics should not (in most cases) be used punitively, but as a means to process improvement. 6. Most metrics cannot be looked at in isolation. That said, my dismay has to do with the general resistance to performance measurements exhibited by many software professionals. Yes, LOC is not a perfect measure of productivity. But temperature isn't a perfect measure of how nice it is outside either. Nonetheless there *is* a correlation. Counts of bug fixes are not a perfect indicator of quality (especially if they are not qualified by severity) but it *is* an indicator. Etc. Fear of metrics being misused or fear over them being manipulated by those being measured is not really a valid reason to reject the use of metrics -- they simply indicate bigger problems of trust and proffessionalism within the group. I think when we, as a profession, constantly resist these types of measures rather than trying to use them or improve upon them we come across as prima donnas who just want to be given carte blanche.
But Jeff, temperature has a meaningful interpretation. It's a statistical description of the average kinetic energy of ambient molecules in the air (under a classical interpretation anyway). This interpretation tells us where the temperature quantity is relevant. What's the interpretation of LOC? Maybe you can say that the change in LOC says something about the "temperature" of a project. If there's a lot of change then the project is very active. If the change is negative, then the project is undergoing structural changes. If it's positive, the project is being extended in some way. If the magnitude of the changes is small, it might mean it's a mature project, or it might mean that the source base has turned into a tar pit. In that sense, maybe it's a useful metric. But what many people object to is the use of this metric as an indicator of code quality (see the title of this thread) and especially the quality of the contributions of individual programmers. And the unfortunate fact about this business is that there are *a lot* of managers out there who know nothing about the fundamentals of the business and will completely misuse metrics like this. The way that the average manager uses LOC is like, to use your analogy, using temperature to divine the direction and speed of individual molecules.
Managers: If you know how to code: you don't need metrics. Review the code that people write. Peek random pieces of code. Check it, say, monthly. You'll know instantly if they're good programmers. If they aren't, fire them. If you don't: cross your fingers and hope things will work well. Or hire a "development manager" and trust him or her to assess and help improve the quality of the team and each of its members. In none of the two cases should you use simple metrics, especially if you don't understand them, and especially if they are bugs/LOC. I defended bug counts before but just to track the evolution of a _product_. Don't individualize the bug counts trying to assign them to one specific programmer. In the former case, however, there are some tools that will help you do your assessment. I said help, not do it for you. For example, there are tools for most languages which will show you which are the most complex functions in your classes. In particular they'll pinpoint those functions whose cyclomatic complexity is so high that they're considered "untestable". This can help. If the code before the programmer took it (say, a copy three months ago) had twelve of these functions, and now it has none, that sure means something. If it had none (as it should be, of course) and now it has twenty, it's bad. (But, of course, also read the actual functions to confirm that they're a real mess.) If you're an excellent manager, the team will already be compact and have good practices and new people won't write production code until they've learnt how things work there, and their bad code will break the nightly tests, but hey, in that case, email me :) (Oh I forgot I removed my email because I didn't like the envelope)
It's so cute to see you guys start to talk about grown-up topics instead of language-bashing. How are you supposed to gauge improvement or degradation of a process over time? If I go into a Toyota plant, you can bet yourself that the QA department knows what % of the Celicas that roll off the line will have a defect in the radiator. If the software engineers (real engineers measure their products) put in place a small number of metrics that are usefull, real inferences can be drawn and process improvements can be prototyped.
cheeto Thursday, September 23, 2004
There are points to collecting bug metrics based on LOC, although you need to have a large sample before you can draw a lot of use out of them. In particular, if you can come out with an average defect ratio, you can start making estimates as to the number of resultant bugs for a given development effort. This is not a guarantee that you won't have more or fewer bugs, but it is representative of the general number of detectable bugs that normally escape your process. Along with this information, you want to be collecting data on which phase of development the bugs were introduced in so that you can determine where the real problems are. Are the bugs just caused by typos in the implementation or are they really a result of poor design or misunderstood requirements. There are useful applications for these numbers, but you need to collect them uniformly with a foreknowledge of what they are being used for. Without that, it becomes pointless.
! Thursday, September 23, 2004
Kalani, I'm going to give an example of points 3 and 6 of Jeff (but both you and Allan5 are correct, if there is a wrong way to use metrics, (mis)managers will quickly discover and (mis)use them): My office is located at 8000 feet (2400 meters) above sea level. Most afternoons this summer, the temperature was around 70 degrees Fahrenheit (21 Celcius). Humidity is usually under 30%. If it was sunny, and not windy, it was short sleeve weather. Felt like 70F/21C. If it was sunny and windy, a long sleeve shirt was indicated. Felt like 60F/15C. If it was overcast, but not windy, a light jacket was needed. Felt like 50F/10C. If it was overcast and windy, you needed to wear a coat. Felt like 40F/4C, even though the thermometer said 70F/21C. Whether it rained or not, made no difference in the overcast temperatures. Since I walk 2 miles to the bus stop, knowing the weather and appropriate attire is very important. Oh yes, we had snow yesterday. Moral: no single measure is perfect. But a single measure is far superior to zero measures. To improve things, you need to start measuring. Why did this bridge fall down? Why did this airplane crash? Why are all these people getting cholera? As you start measuring things, you find that some measures are useless (the best example of a useless measure is phrenology), and some are more relevant than first thought. LOC gets disparaged a lot, and for pretty good reasons. There are a lot of other measures one can make. A whole bunch of metrics try to measure "how big is that program?" Some try to measure "how complicated is that program?" I think we can all take as axiomatic, that a large complicated program is more likely to be buggy than a small, simple program. Some measures of size include things like: lines of code (which as the above posters mentioned, can be manipulated for fun and profit), number of variables, how much memory is used, function points. Some measures of complexity can include: how tight/loose is the coupling between modules, how many paths through the code are there (if there is 1 path, that should be far easier to test than one with 20 if/then/else statements which could run to about 2^20 ~ 1 million possible paths), how complicated is the coupling of procedures. Oh, to mock myself in the example I gave above: if ( weather.windy( ) ) clothing++; if ( weather.overcast( ) ) clothing++; if ( weather.winter( ) ) BuyAFreakinCarOrElse( ); There are a lot things that one can measure. Please do not assume that the only possible measure is "lines of code."
Peter Thursday, September 23, 2004
Staying out of the argument as to whether metrics are useful or not, I just want to give two pointers. If you're developing in Java, then Metrics Reloaded is a great plugin for IntelliJ IDEA - lets you choose from hundreds of different metrics, and record and compare them over time. And I found the NASA Software Measurements Guidebook at http://sel.gsfc.nasa.gov/website/documents/online-doc/94-102.pdf to be an interesting read.
Metrics generally suffer from a Heisenberg-like problem: what you're really interested in is what the metrics would be if you weren't concerned about the metrics. I'd encourage software developers to track a bunch of the more popular metrics, _for their own use only_. (On-task time, LOC, and bug count are good ones to get started with.) This can help you understand and improve your own process. If that's the goal, you won't do much to affect the metrics, cause you'd just be defeating your own improvement efforts. On the other hand, if the goal is demonstrating to management which developers are really good, or more or less productive, or error-prone, or whatever, the metrics will be manipulated to the point that they're useless.
My manager recently asked me to do some preliminary investigations re: the use of metrics. What I came up with (below) drew heavily on various threads discussing the topic here on JoS (which were the source of most of the supporting quotes): 1) Metrics are very difficult to do well. - In the context of software engineering, "quality" and "productivity" are very hard to objectively quantify. In the words of Carnegie-Mellon's SEI: "Unfortunately, most of the metrics defined have lacked one or both of two important characteristics: a sound conceptual, theoretical basis; and statistically significant experimental validation" As one programmer put it, "... varying projects have wildly differing levels of difficulty. If my colleague spends two weeks writing a reusable, documented thread pooling class, and I spend two weeks dropping controls on forms, he may end up with 200 lines of code, and 5 bugs, I may end up with 2000 lines of code and no bugs. Really, he's the hero and I'm average. But how will metrics explain this?" 2) Unless done well, metrics do more harm than good. - Beware the 'law of unintended consequences' - accidentally creating incentives/disincentives for the wrong thing(s). Example: if checkins are used as a measure of productivity, incentive is created to check in more often (e.g. at the end of every workday) rather than when it makes sense to do so (e.g. when coding + unit test are complete). "What ever you decide to measure is what you are going to get... And you'll get NOTHING else. These sorts of extrinsic measurements (and rewards based thereon) cause you to be less focused on your work and more on the extrinsic measurements. You'll be thinking "how many hours did I bill today" instead of "what button name is going to be clearest to the customer -resulting in fewer tech support calls, happier customers, and higher net revenue". 3) Lines of code is not a reliable indicator of quality or productivity. - Implicit assumption that more code = better, when in software often precisely the opposite is often true. Example: if lines of code are used to measure productivity, then there's incentive to cut and paste duplicate blocks of code (the larger the better) instead of creating re-usable functions, so as to artificially inflate 'output'. "... the best programmer is generally the one who takes the most code away, not the one who adds the most code ..." "The best developers ... spend the bulk of their time analyzing the problem, and a small portion cranking out compact, clean code." "... a one-line PERL program can be much harder to understand (and therefore more complex) than a 10 line one that does the same thing..." 4) Metrics should NEVER be used to rate individuals, e.g. for performance evaluations and/or to determine compensation - Effort will be expended (often successfully) to "game" the system. Example: if developers are rewarded for fixing bugs, there is incentive to intentionally introduce bugs (even if presumably easy-to-repair ones) to increase opportunities to garner rewards. Alternatively, if number of reported bugs in a developer's code is a factor in their appraisal, there is a strong DISincentive for QA to report bugs - either the bug tracking system will be bypassed, or bugs will simply go unreported. Other examples are as cited in 2) and 3) above. 5) If correctly designed, aggregated metrics CAN be useful for measuring the productivity of a team. Collecting metrics for an entire project over time can mitigate some of the local variability that leads to the weaknesses described above. But they must be collected consistently over time until a meaningful body of history is accumulated, and even then the limitations of the metrics so gathered must be understood and acknowledged.
We recently added a metrics that measure and log the test coverage rates, as well as cyclomatic complexity of existing code. Of the two, the coverage information seems to have more immediate value. The complexity analysis could certainly be used to identify the "coding horrors" folks like McConnell speak of in "Code Complete." Still, there is no substitute for regular code reviews, but this can help focus them somewhat I think.
LOC is quite similar to using a count of words in a novel. You cannot evaluate an author based on the number of words in a book, and the LOC is similar to this. I agree with Daniel Daranas and others; Just like you need a competent person to read the book to evaluate the quality, you need a competent person to read the code to evaluate the quality. The number of bugs depends very much on the quality of the developer and the complexity of the program (ie. a more competent developer may have more bugs if s/he is assigned a more complex task).
AI Friday, September 24, 2004
> In none of the two cases should you use simple metrics I used to do code reviews on the software written by all new hires working on "my" software ... and I would stop reviewing a person's code, after the metrics (i.e. # bugs detected during code inspection and during subsequent testing) showed that that person was able to write bug-free software.
> My company wants to do metrics based on bugs/LOC I tracked bugs/person/month. I personally didn't care about LOC (given a team of people writing shrink-wrap software over the long term, I thought that the quality of each developer's contribution was more important or visible than the quantity).
It may be useful to classify errors by *type* (for example UI (usability); threading; things that go wrong when something 'unexpectedly' disappears (someone powers-off the machine, or unplugs the network cable); 'integration' problems due to several people working simultaneously; missing features/requirements; etc.): so the metric would be "# errors / type of error" rather than "# errors / LOC": because knowing the source of errors lets to address the cause, and change people's training, or the development process, or the tools, or the inspection checklist, etc.
If LOC is a stupid measuer (it is), it can be made even more stupid by using the word "bug" without defining it properly. What counts as a bug? When developing, I know that there is a bug in my code but I'm working on it, it's not fixed yet, and I'm tired and have to go to sleep and I'll continue tomorrow. Is that a bug or not? I think it's more like an unfinished part of the program. Bad decision that will prove wrong only after hours of usage, is that a bug? And what is a UI bug? Misspelled word? So if I just design and write no code, but have two incorrectly spelled labels on the window, is that two bugs per no LOC? Pretty damn bad percentage! IMO, one thing is to set the goals first, and then as a metric measure the reality against the goals. In other words, write a (very) detailed development plan, and then check every day or two how you are doing. If you start getting behind (or before) where you should be at, then you've either planned wrong or programming less than you should be. So which is bad? I don't know, but at least you will see if there is something that should be done. And when it's all done, just check how far off the mark you landed and hopla! There's the (IMO) only important metric. Of course I'm assuming a reasonably solid end product (no cutting corners to make the deadline or just to cheat the measuring metric). In any case, metrics should always be decided on based on what they gain. The most important thing is typically to produce good quality in scheduled time, so a metric that doesn't help point out where those may not be met are rather pointless. How many LOC? What the hell difference does that make, if it works as it should and is done on time? (Btw: My in-house project just hit 100.000 LOC last friday. Wopee! How many bugs? I hope less than 100.000, otherwise I'll have to fire myself ;-)) | |
Powered by FogBugz
