September 2007
Monthly Archive
Code quality’s singular metric
There are some excellent articles and posts around the internet recently related to a question asked on LinkedIn about metrics and code quality. Specifically, the question asked was:
What are the useful Metrics for Code Quality?
The user goes on to state that
The quality of any software application will depend mostly on its code base and it’s important to know what might be the key metrics that help us to evaluate the stability and quality of the code base.
Many of the answers bring up excellent points, including different ways of viewing the question, such as taking the time to understand what attributes of code are related back to quality. For example, Michael Bolton aptly suggests you ask if the code is:
- testable
- supportable
- maintainable
- portable
- localizable
What’s more, further answers suggest coverage, code size, and other metrics as discrete measurements that can help gauge quality. These are all excellent answers; however, the answer turns out to be quite simple.
We have found time and time again that there is one metric that most appropriately relates to code quality: Cyclomatic complexity. If your code base has highly localized pockets (i.e. methods) of Cyclomatic complexity (or CC) your code will have issues (undoubtedly affecting quality how ever you define it) eventually.
In fact, CC affects arguably every attribute listed above (testable through localizable). Think about it for a minute: a method that has 27 different paths is next to impossible to adequately test, which means you’re going to have a doozy of time supporting it because it isn’t easy to maintain. Code that is littered with high CC is a blast to port as well (hopefully you’ve got deep pockets and customers that absolutely love you). Good luck localizing it too.

It turns out that the other metrics mentioned (such as code size) tend to correlate to each other– in fact, it seems that all complexity-like metrics point back to CC. Classes that have a lot of dependencies are usually big and big classes usually have big methods and big methods usually have lots of conditionals. Lots of conditionals mean a high CC value (CC measures paths through a method, such as from an if/else chain).
Code coverage is an excellent metric for ascertaining what code isn’t touched by tests and it happens to relate directly to CC because in order to reach 100% branch coverage you’d have to have a one to one relationship with CC (i.e. if a method has 27 different paths, you’d need 27 tests to reach full coverage). Plus, coverage can be unfortunately misleading and can provide a false sense of security.
The beauty of CC is that it’s one metric. One number is all you need to understand risk. You can then apply it in many ways. For example, we provide development teams with ratios related to CC (because CC precisely delineates complex methods it’s often helpful to relate it to other normalized metrics) that enable them to gauge quickly the overall health of a code base. When the ratios grow, things are getting worse and when they decrease, happiness ensues.
The definition of quality (and its associated attributes) as it relates to software has traditionally been quite hard to nail down (regardless if you are a customer or a developer); however, one thing is factual– complex code is a house of cards that will eventually collapse (via attrition, bankruptcy, ossification, etc). Finding complexity and proactively reducing it will lead to software that is more testable, maintainable, and supportable. And by the way, that happens to be the kind of software customers like.
Developer Testing25 Sep 2007 12:16 pm
To xUnit.net or not?
The brains that brought you NUnit have decided to create a new and improved testing framework dubbed xUnit.net. Their goal is to learn from the lessons of NUnit, thus creating a more usable framework (for example, they realized you don’t need to attribute a class if you’ve used an attribute on a method!).
Their website offers a nice comparison of xUnit.net versus NUnit, MBUnit, and MSTest that’s worth studying.
The question still remains though: will you drop NUnit, which is arguably the de-facto testing framework for the .NET platform to use xUnit.net?
Consequences of Pipeline Structure
There are two general ways in which you can structure a build pipeline: one cuts across the product and the other across validation.
I was recently asked about breaking up the build by having a build hierarchy that mirrored the product’s package dependency tree. I have seen this done, and done it myself, in a very coarse way, dividing along lines strongly marked both by organization and architecture. There is a even a well known pattern on this: Named Stable Bases. It should be made clear that this is a form of deferred integration. Yes, yes, the ill consequences of deferred integration are the very things that continuous integration is working to alleviate. Just because deferred integration has been abused in the past is no good reason not to wield it wisely now. So lets take a look at how it alters the dynamics of a build system. To show this I have created a loop diagram, see this post for more details on loop diagrams and remember that the "s" means varies the same and "o" means varies the opposite.
![CropperCapture[16]](http://www.testearly.com/wp-content/uploads/2007/09/CropperCapture%5B16%5D_thumb1.png)
The entire product is not built, even compiled, all at one time; it is spread out over time. Many times there can be a significant delay between when a dependency is built and when a dependant consumes that new build. This breaking up of the build has the effect of decreasing the size of the Codeline which ultimately shows the effect of increasing the rate of change to the Codeline. This is a short sighted view, remember that we are building a whole product and that we have deferred some integration. The key here is some, not all. So the question is will the things effected positively out weigh the deferred integration? I have illustrated the forces at play here in a well know pattern or archetype in Systems Thinking named Fixes That Fail.
Fixes That Fail (^)
The Fixes That Fail structure consists of a balancing loop and a reinforcing loop. These two loops interact in such a way that the desired result initially produced by the balancing loop is, after some delay, offset by the actions of the reinforcing loop.

The internal balancing loop operates in the standard balancing loop fashion. The Action that influences the migration of the Current State also influences, after some delay, some Unintended Consequences. These Unintended Consequences subsequently impede the migration of the Current State in the intended direction.
One of the dangers I see with this approach is how it shifts the focus away from the product and the overall build systems to the product components and their smaller builds. It is more difficult to see the big picture and foresee the repercussions of actions in the small on the large. This type of build pipeline shifts the delay of feedback from an individual integration duration to the push or pull between individual builds. For example there is an uncomfortable delay in getting a developer feedback when the build of an entire product takes 45 minutes. One might feel a good solution is to break up this single build into 3 builds one for the common libs, one for the server, and one for the client. Now when a developer of the client submits to the build they get feedback in 8 minutes. Here is the deception of this model, the overwhelming majority of the time the client build is working against old versions of the common libs and server. The delay is getting the build output of the common libs and server integrated into the client build.
In my experience most people start down the path of splitting a build in this fashion by first seeing a waist in recompiling packages that don’t need to be recompiled. This is a valid observation and fruitful optimization, and I think that in most cases division of build process to realize this benefit yields a net negative in consequences. I have only seen this structure work well as a means to compensate for organizational issues. For example when the web front end team has little contact with the back end service team. The web front end team may experience interruptions due to changes in the back end that impede progress. One means to compensate for this is to accept regular stable deliveries of the back end. This allows for uninterrupted progress, flow, and regular planned integrations to the new version of the back end.
I have experienced great success with division of the build across validation as apposed to across the product. In the past I have referred to this as creating developer, release, tester, and customer facing builds. Let’s generalize that a little to: role facing builds. Before I explain remember that the desired state is rapid feedback and the gap is perceived delay in feedback. When we take into account the different roles on the team we realize that they each have different desires. The developers are accepting of less comprehensive feedback in exchange for quicker execution times. The testers in contrast are accepting of longer execution times in exchange for more comprehensive feedback. The developers are not interested in creation of an installer, verses everyone else is extremely interested in an installer. If you continue down this path of inquiry you will have a set of values that will help you shape a build pipeline. I bet that you would find only the developers perceive a delay in feedback. Creating role facing builds can be quite liberating in what you can do and what you don’t need to do in a build. It is the don’t need to do that has the greatest affect on the developer facing build. You don’t need to do a clean build, an incremental or dirty build can be very quick. You don’t need to execute long running tests. You don’t need to create an installer.
This is not a perfect solution, there is no such thing. So what about the negative consequences? A successful build from a developer facing build can give a developer the illusion that all is well. They need to remain engaged and interested in the results of the cascaded builds. I have not seen any other generic issues with this approach, here is an example of an issue with how it is implemented: It is important that the developer be able to easily execute all parts of any build in the system that can cause a failure and or create build artifacts. If for example a developer can not run a test that is causing a build to fail they will not be very likely to jump right on it. There will be negatives in what ever implementation you conceive of. You need to be alert to them as you will not be able to foresee them all. There are many ways in which to mitigate and or alleviate there impact.
Creating a loop diagram of your build system can be a tremendous help in this regard. Here is a picture of a generic build system to help illustrate the value:
![CropperCapture[17]](http://www.testearly.com/wp-content/uploads/2007/09/CropperCapture%5B17%5D1.png)
Here is a link to the free software that I drew this with and the diagram itself:
http://www.simtegra.com
No you are not alone…The Single Command Build
In “Am I Alone Here?”, Tim Goodwin has a nice blog commenting on his desire to run a single command build using only assets from the version control repository. I could not agree more. As Martin Fowler mentions in his Continuous Integration article “…anyone should be able to bring in a virgin machine, check the sources out of the repository, issue a single command, and have a running system on their machine.”
Tim points out some of the challenges he experienced while working in a Microsoft environment. It is imperative that vendors give us the capability to run everything from the command line with minimal effort. GUI tools are fine as long as you still have the capability to run the same from the command line or API. Wizards or other GUI tools provide a level of automation (the first time) but not when we need to repeat a series of activities. In particular, when we need to setup an automated Continuous Integration server.
I am fond of saying if you can’t run (a tool: server, program, etc.) from the command line, it doesn’t exist (as a tool).
P.S. When is Microsoft going to eliminate the Visual Studio dependency on MSTest? This means I must install an IDE on any machine I wish to build software when using MSTest…not an acceptable option to me.
European CI love-fest this October
If getting together with a bunch of like-minded individuals to discuss the pros and cons of BDD versus TDD or to debate the relative merits of a two phased CI commit model sounds like a blast, then you need to make plans to attend the 2007 European CITCON in the wonderful city of Brussels, October 19 and 20th.
I had the pleasure of attending the 2007 North American CITCON, where I had the opportunity of interacting with a fabulous group of people– it was like hanging out with a large family (without all the arguments). I learned quite a few things and really enjoyed sharing war stories and hearing how other people are solving some of the same issues I run into all the time.
Plus, with all the various Belgian beers one can choose from when in Belgium, this promises to be a fun filled few days, especially given the fact that the conference is free– just think of the money you can now spend on beer (or on copies of the CI book)!
Expanding definitions
When you see (or hear) the phrase “software defect” what does that mean to you? Other than the obvious fact that defects are bad, what are they? If you are a fan of wikipedia, then maybe you liken a defect to:
an error, flaw, mistake, failure, or fault in a computer program that prevents it from behaving as intended
I certainly agree with this definition; however, the latter part of it (”prevents it from behaving”) tends to focus on a running application. Don’t get me wrong, defects (as we’re used to defining them) usually manifest themselves when an application is run (i.e. the nefarious Blue Screen of Death); nevertheless, if you subscribe to the notion that code should be, in essence, mistake proof through arduous early testing, then I have found that by expanding the definition of what a defect is has the benefit of improving a software development process by highlighting how one’s actions from day one can have long terms implications.
For instance, if your definition of a defect is limited to application behavior (say, in user terms), you may end up delaying your own ability to actually find them. The time delta between when code is cut and when someone can exercise some sort of acceptance test is directly related to an increase in actual costs (as everyone already knows). This is why, of course, Agile methodologies espouse short iterations as opposed to big bang waterfall-like cycles.
Alternatively, if you broaden your definition of a defect, then you can, more often than not, find them sooner. Assuming your team is actually writing developer tests, then a defect could be that a test failed (note that 9 times out of 10, a developer test is executed against a portion of a running system). Ideally, if tests exist and you’ve got an automated build system, then you can run those tests often (like every time your SCM system changes via Continuous Integration). Already, in this scenario, you can find defects early.
What if you take it a step further– say, if a failing developer test is considered a defect, then perhaps another form of a defect could be the lack of a developer test? If you define “legacy code” as code that doesn’t have a unit test and you’ve committed to stop writing legacy code, then by definition, code that is modified or new and doesn’t have a test is therefore legacy code (which is bad); hence, the addition of legacy code into an SCM should be a defect!
But wait! Developer tests (i.e. unit tests) require that people actually write them (in some cases they can be generated). And as I noted earlier, tests are usually executed against a running system. What if you further expanded your definition of a defect to include variations in code metrics, such as complexity? For example, if you are able to measure complexity (such as cyclomatic complexity) and you tend to eschew it, then if complexity increases, you can surely consider that a defect, can’t you?
You can even combine various measurable aspects of developmental activities to further exploit finer grained defects. For example, if you have developer tests, then you can surely obtain code coverage numbers. Once you’ve obtained coverage values, if they drop, you can infer a few things:
- New code was added that doesn’t have any tests for it (legacy code == defect)
- Someone deleted some tests
Either way, code coverage dropped, so why not consider that a defect?
What’s more, why not combine measurements further– if code coverage drops for a particular section of code AND the complexity of that code increased, bingo! Defect. Of course, the combinatorics go on, but hopefully, the point is evident: if you broaden what a defect is to you, you can then do things to find them early.
Your definition will be different based upon business needs as well, but rest assured that if you want to produce reliable code quickly, you’ve got to find defects as early as possible and to do that you’ve got to expand your definition of a defect.
— Next Page »