Java is not the best of languages. There are plenty of languages better for particular niches or uses, and it’s littered with annoyances and prone to abuses. So are C, COBOL and Fortran. But it’s good enough almost always, and the environment that has grown up around it has made it a useful language for building reasonably performant web-facing server products. One thing that is a standout though is the ease with which Java can reflect on itself and examine itself at runtime.
This has opened the door for a number of community led tools that allow us to declare quality standards, and automatically monitor and control adherence to those standards. These are powerful ideas: coders can relax and focus on the task at hand, secure in the knowledge that the surrounding infrastructure will maintain the quality of the code. It’s like a writer with a word processor and a good editor: spelling errors will get sorted out immediately, and somewhere down the track the grammar and prose will get beaten into shape.
There are now a good mix of static and dynamic analysis frameworks out there, and I’ve settled on Findbugs, Checkstyle and Jacoco as the core. PMD is in the mix as well, but more as a backstop for the other tools. The thing that appeals to me about these three is that the analysis they will do, and the standards they mandate, can be declared via the same Maven POM as the rest of the build definition – and in the IDE as well – so that quality control is baked in at the lowest level of development activity.
Because these are declared quality standards, it means that our Jenkins CI tool can use the same declaration to pass or fail a build – code that does not meet required standards cannot progress out of development, and Jenkins provides visibility of the current level of code quality. Jenkins is not so good, though, at showing longer term trends, which is where Sonar comes in. I was delighted to discover that Sonar had become freely available as SonarQube, as it’s a fantastic tool for seeing at a glance if there are quality trends that need to be addressed, and for expressing complex code quality issues in a cogent fashion.
The tool chain then is trivially simple for the developer to use. Maven and the IDE on the desktop tell her immediately if there are code quality issues to address before committing. On commit, the Jenkins CI build is a gatekeeper that will not allow code that does not meet certain basic criteria to pass. Finally Sonar gets to look at the code and see how it is progressing over time.
I am pleased with this tool chain for two reasons. First, code quality is an integral part of the developers daily experience, rather than something bolted on that happens later and is somebody else’s problem. Quality becomes a habit. Second, the process is entirely transparent and visible. The hard code quality metrics are right there for all to see (for certain values of “all”, they do require authentication to examine) and are visibly impartial and objective, not subjective. If I commit something dumb, it’s not a person telling me he thinks I’m wrong. The quality of my work is not only my responsibility, I have objective benchmarks to measure it against.
This sort of toolchain exemplifies in my mind a mature approach to technology by automating standard procedures, and automating whatever does not need human intervention. It’s madness to repeat any process that can be automated, more than once or twice, and the time and cost saving of automated quality control compared to manual quality control is enormous. The drawback is that setting up – and to some extent maintaining – the tool chain is non-trivial, and there is a risk that the cost of this setup and maintenance can deter enhancement or rectification of flaws in the toolchain. An interesting implication of this is that the elements of this tool chain – Jenkins, Sonar and so forth – should be treated as production environments, even though they are used to support development. This is a distinction frequently lost: this stuff needs to be backed up and cared for with as much love and attention as any other production infrastructure.
Now, not everyone appreciates the dogmatism and rather strong opinions about style implicit in the toolchain, particularly arising from Checkstyle. Part of the point of Checkstyle, Findbugs and PMD is that, like it or not, they do express the common mean generally accepted best practices that have arisen from somewhat over 15 years of community work on and with Java. They’re not my rules, they’re the emergent rules from the zeitgeist. There are really two responses if these tools persistently complain about something you habitually do in code, that one thing that you always do that they always complain about. You can relax or modify the rules, build in local variations. Or you can stop and think, and acknowledge, that maybe, just maybe, your way of doing things is not the best.
They are, after all, fallible automated rules expressed through fallible software. They are not always going to get it right. But the point of the alerts and warnings from these tools is not to force the coder to do something, but to encourage her to notice the things they are pointing out, encourage her to think about what she is doing, encourage her to think about quality as part of her day-to-day hammering on the keyboard. I’d rather see fewer, more beautiful lines of code, than lots of lines of code. It’s not a race.
I find it interesting that being able to objectively measure code quality has tended to improve code quality. Observation changed the thing being observed (is that where heisenbugs arise?). There’s not a direct relationship between the measuring tools and the code quality. Rather what seems to have happened is that by using the toolchain to specify certain fixed metrics that must be attained by the code in order for that code to ‘pass’ and be built into release artefacts, then the code changes made to attain the metrics have tended to push the code to cleaner, simpler, more maintainable code. I am aware that there are still knots of complexity, and knots of less than beautiful architecture, both of which I hope to clean up over the next year, but the point is not that those problem areas exist, but that they are visible and there’s going to be an objective indication of when they’ve been eradicated.
There seems to be a lower rate of defects reaching the QA team as well, although I don’t have a good handle on that numerically – when I first started noticing it, I neglected to come up with a way of measuring it, and now it’s going to be hard to work it out from the Jira records. (The lesson of course being: measure early, measure often.) In general the defects that seem to be showing up are now functional and design problems, not simply buggy code, or else the sorts of performance or concurrency problems that really only show up under production-like load which are difficult and expensive to test for at the development stage as a matter of day-to-day development.
There is a big caveat attached to this toolchain though. I’m a fan of an approach that can be loosely hand-waved as design-by-contract. There’s value in expressing exposed functional end-points – at whatever level of the code or system you pick – in terms of statements about what input will be accepted, what the relationship between input and output is, what side-effects the invocation has, and so forth. Black box coding. As an approach it fits neatly against TDD and encourages loose coupling and separation of concern. All very good things. In practical terms, however, it depends on two things: trust that the documentation is correct and the contract matches the implementation, and trust that the implementation has been tested and verified against the contract. If those two things can be trusted, then the developer can just use the implementation as a black box, and not have to either delve into the implementation code, nor build redundant data sanitisation or error handling. At the moment, there’s no automated means to perform this sort of contract validation. The best option at this point seems to be peer code reviews, and a piece of 2×4 with nails in it (1), but that’s expensive and resource intensive.
The bottom line reason for investing in a tool chain like this – and make no mistake, it’s potentially expensive to set up and maintain – is that if you have a typical kind of team structure, it’s easy for the developers to overwhelm the QA team with stuff to be tested. The higher your code quality, and the more dumb-ass errors you can trap at the development stage, the less likely it is that defects will get past your harried QA guys.
(1) It’s like I always say, you get more with a kind word and a two-by-four than with just a kind word. – Marcus Cole