Technology – Page 6 – The Occasional Masthead

A Certain Quality

Java is not the best of languages. There are plenty of languages better for particular niches or uses, and it’s littered with annoyances and prone to abuses. So are C, COBOL and Fortran. But it’s good enough almost always, and the environment that has grown up around it has made it a useful language for building reasonably performant web-facing server products. One thing that is a standout though is the ease with which Java can reflect on itself and examine itself at runtime.

This has opened the door for a number of community led tools that allow us to declare quality standards, and automatically monitor and control adherence to those standards. These are powerful ideas: coders can relax and focus on the task at hand, secure in the knowledge that the surrounding infrastructure will maintain the quality of the code. It’s like a writer with a word processor and a good editor: spelling errors will get sorted out immediately, and somewhere down the track the grammar and prose will get beaten into shape.

There are now a good mix of static and dynamic analysis frameworks out there, and I’ve settled on Findbugs, Checkstyle and Jacoco as the core. PMD is in the mix as well, but more as a backstop for the other tools. The thing that appeals to me about these three is that the analysis they will do, and the standards they mandate, can be declared via the same Maven POM as the rest of the build definition – and in the IDE as well – so that quality control is baked in at the lowest level of development activity.

Because these are declared quality standards, it means that our Jenkins CI tool can use the same declaration to pass or fail a build – code that does not meet required standards cannot progress out of development, and Jenkins provides visibility of the current level of code quality. Jenkins is not so good, though, at showing longer term trends, which is where Sonar comes in. I was delighted to discover that Sonar had become freely available as SonarQube, as it’s a fantastic tool for seeing at a glance if there are quality trends that need to be addressed, and for expressing complex code quality issues in a cogent fashion.

The tool chain then is trivially simple for the developer to use. Maven and the IDE on the desktop tell her immediately if there are code quality issues to address before committing. On commit, the Jenkins CI build is a gatekeeper that will not allow code that does not meet certain basic criteria to pass. Finally Sonar gets to look at the code and see how it is progressing over time.

I am pleased with this tool chain for two reasons. First, code quality is an integral part of the developers daily experience, rather than something bolted on that happens later and is somebody else’s problem. Quality becomes a habit. Second, the process is entirely transparent and visible. The hard code quality metrics are right there for all to see (for certain values of “all”, they do require authentication to examine) and are visibly impartial and objective, not subjective. If I commit something dumb, it’s not a person telling me he thinks I’m wrong. The quality of my work is not only my responsibility, I have objective benchmarks to measure it against.

This sort of toolchain exemplifies in my mind a mature approach to technology by automating standard procedures, and automating whatever does not need human intervention. It’s madness to repeat any process that can be automated, more than once or twice, and the time and cost saving of automated quality control compared to manual quality control is enormous. The drawback is that setting up – and to some extent maintaining – the tool chain is non-trivial, and there is a risk that the cost of this setup and maintenance can deter enhancement or rectification of flaws in the toolchain. An interesting implication of this is that the elements of this tool chain – Jenkins, Sonar and so forth – should be treated as production environments, even though they are used to support development. This is a distinction frequently lost: this stuff needs to be backed up and cared for with as much love and attention as any other production infrastructure.

Now, not everyone appreciates the dogmatism and rather strong opinions about style implicit in the toolchain, particularly arising from Checkstyle. Part of the point of Checkstyle, Findbugs and PMD is that, like it or not, they do express the common mean generally accepted best practices that have arisen from somewhat over 15 years of community work on and with Java. They’re not my rules, they’re the emergent rules from the zeitgeist. There are really two responses if these tools persistently complain about something you habitually do in code, that one thing that you always do that they always complain about. You can relax or modify the rules, build in local variations. Or you can stop and think, and acknowledge, that maybe, just maybe, your way of doing things is not the best.

They are, after all, fallible automated rules expressed through fallible software. They are not always going to get it right. But the point of the alerts and warnings from these tools is not to force the coder to do something, but to encourage her to notice the things they are pointing out, encourage her to think about what she is doing, encourage her to think about quality as part of her day-to-day hammering on the keyboard. I’d rather see fewer, more beautiful lines of code, than lots of lines of code. It’s not a race.

I find it interesting that being able to objectively measure code quality has tended to improve code quality. Observation changed the thing being observed (is that where heisenbugs arise?). There’s not a direct relationship between the measuring tools and the code quality. Rather what seems to have happened is that by using the toolchain to specify certain fixed metrics that must be attained by the code in order for that code to ‘pass’ and be built into release artefacts, then the code changes made to attain the metrics have tended to push the code to cleaner, simpler, more maintainable code. I am aware that there are still knots of complexity, and knots of less than beautiful architecture, both of which I hope to clean up over the next year, but the point is not that those problem areas exist, but that they are visible and there’s going to be an objective indication of when they’ve been eradicated.

There seems to be a lower rate of defects reaching the QA team as well, although I don’t have a good handle on that numerically – when I first started noticing it, I neglected to come up with a way of measuring it, and now it’s going to be hard to work it out from the Jira records. (The lesson of course being: measure early, measure often.) In general the defects that seem to be showing up are now functional and design problems, not simply buggy code, or else the sorts of performance or concurrency problems that really only show up under production-like load which are difficult and expensive to test for at the development stage as a matter of day-to-day development.

There is a big caveat attached to this toolchain though. I’m a fan of an approach that can be loosely hand-waved as design-by-contract. There’s value in expressing exposed functional end-points – at whatever level of the code or system you pick – in terms of statements about what input will be accepted, what the relationship between input and output is, what side-effects the invocation has, and so forth. Black box coding. As an approach it fits neatly against TDD and encourages loose coupling and separation of concern. All very good things. In practical terms, however, it depends on two things: trust that the documentation is correct and the contract matches the implementation, and trust that the implementation has been tested and verified against the contract. If those two things can be trusted, then the developer can just use the implementation as a black box, and not have to either delve into the implementation code, nor build redundant data sanitisation or error handling. At the moment, there’s no automated means to perform this sort of contract validation. The best option at this point seems to be peer code reviews, and a piece of 2×4 with nails in it (1), but that’s expensive and resource intensive.

The bottom line reason for investing in a tool chain like this – and make no mistake, it’s potentially expensive to set up and maintain – is that if you have a typical kind of team structure, it’s easy for the developers to overwhelm the QA team with stuff to be tested. The higher your code quality, and the more dumb-ass errors you can trap at the development stage, the less likely it is that defects will get past your harried QA guys.

(1) It’s like I always say, you get more with a kind word and a two-by-four than with just a kind word. – Marcus Cole

Java 7 JDK on Mac OS X

This is one of the things that Apple should be kicked in the shin for. There is no excuse for continuing to completely foul up Java installation on Mac OS X

If you are like me, and trying to figure out how to get the Java 7 JDK installed on the latest build, here is the key: http://stackoverflow.com/a/19737307

The trick for me is probably the trick for you:
1) download the JDK from Oracle
2) run the downloaded DMG to install
3) modify your .profile or .bashrc or wherever you have it to include

JAVA_HOME=$(/usr/libexec/java_home) export JAVA_HOME

4) make another cup of coffee and curse.

On Testing

I really should do a write-up about the CI and code quality infrastructure that I’ve set up, as in recent months it’s really started to pay off for the considerable effort it’s cost. But that’s not what’s on my mind today.

Rather I am struck by how easy it is to really stuff up unit tests, and how hard it is to get them right. I’m not so concerned with simple things like the proportion of code that is covered by tests, although that is important, so much as the difficulty of testing what code should do instead of what it does do. This is not simply an artefact of TDD either, although one of the problems I have with TDD is that it can lead to beautiful tests accurately exercising code that does not actually meet requirements – it worries me that Agile is often treated as an excuse not to define or identify requirements in much depth.

Two examples that I’ve seen recently – and have been equally guilty of – stand out.

First is falling into the trap of inadvertently testing the behaviour of a complex mock object rather than the behaviour of the real object. I’ve been on a warpath across the code for this one, as in retrospect it reveals bad code smells that I really should have picked up earlier.

Second is testing around the desired behaviour – for instance a method that transforms some value into a String, which has tests for the failure case of a bad input, tests that the returned String is not blank or null, but no tests that verify that the output for a given known input is the expected output.

In both cases it feels like we’re looking too closely at the implementation of the method, rather than stepping back and looking at the contract the method has.

Testing. Hard it is.

Deserialising Lists in Jersey Part II

or “Type Erasure is not your friend”

The solution I outlined in my previous has one big drawback (well, two, actually): it does not work.

The trouble is that the approach I suggested of having a common generic function to invoke the request with a GenericType resulted in the nested type being erased at run time. The code compiled, and a List was returned when the response was deserialised, but Jersey constructs a List of HashMap, rather than a list of the declared desired type.

This is extremely puzzling, as I expected that this would collapse at run time with type errors, but it didn’t. My initial thought when this rose up and bit me – and consumed a lot of time that I could ill afford – was that there was a difference in behaviour between deployed run time and running with the Jersey test framework. I was wrong – when I modified my test to examine the content of the returned List, it showed up as a fault immediately.

A diversion: this shows how very easy it is to stuff up a unit test. My initial test looked something like:

List<Thing> result = client.fetchList();
assertNotNull("should not be null", result);
assertFalse("should not be empty", result.isEmpty());

which seems pretty reasonable, right? We got a List back, and it had stuff in it, all as expected. I did not bother to examine the deserialised objects, because I was doing that on a different method that simply returned a Thing rather than a List– that’s sufficient, right? We know that deserialisation of the JSON body is working, right?

Extending the test to something that you would not automatically think to test showed up with a failure immediately:

List<Thing> result = client.fetchList();
assertNotNull("should not be null", result);
assertFalse("should not be empty", result.isEmpty());
assertTrue("Should be a Thing", TypeUtils.isInstance(result.get(0), Thing.class));

The List did not actually contain Thing instances.

A quick solution was obvious, although resulted in duplicating some boilerplate code for handling errors – drop the common generic method, and modify the calling methods to invoke the Jersey get() using a GenericType constructed with a specific List declaration.

This did highlight an annoying inconsistency in the Jersey client design though. For the simple cases of methods like

Thing getThing() throws BusinessException;

then the plain Jersey get() which returns a Response can be used. Make a call, look at the Response status, and either deserialise the body as a Thing if there’s no error, or as our declared exception type on error and throw the exception. Simple, clean and pretty intuitive.

In the case of the get(GenericType) form of the calls though, you get back the declared type, not a Response. Instead you need to trap for a bunch of particular exceptions that can come out of Jersey – particularly ResponseProcessingException – and then obtain the raw Response from the exception. It works, but it’s definitely clunkier than I would prefer:

public final List<Thing> getThings() throws BusinessException {
  try {
    List<Thing> result = baseTarget.path(PATH_OF_RESOURCE)
        .request(MediaType.APPLICATION_JSON)
        .get(new GenericType<List<Thing>>() {});
    return result;
  } catch (ResponseProcessingException rep) {
    parseException(rpe.getResponse());
  } catch (WebApplicationException | ProcessingException pe) {
    throw new BusinessException("Bad request", pe);
  }
}

Note that we get either WebApplicationException or ProcessingException if there is a problem client-side, and so we don’t have a response to deserialise back to our BusinessException, whereas we get a ResponseProcessingException whenever the server returns a non-200 (or to be precise anything outside the 200-299 range) status.

Of course, all of this is slightly skewed by our use-case. Realistically most RESTful services have a pretty small set of end-points, so the amount of boiler plate repeated code in the client is limited. In our case we have a single data abstraction service sitting between the database(s) and the business code, and necessarily that has a very broad interface, resulting in a client with lots of methods. It ain’t pretty but it works, and currently there’s a reasonable balance between elegant code and readable code with repeated boiler-plate bits.

Deserialising Lists with Jersey

I very much like the way in which Jackson and Jersey interact to make building a RESTful interface with objects transported as JSON really, really simple.

As an example, if we have on the server side a class like this:

@Path("/things")
public final class ThingService {
  @GET
  @Path("/thing")
  @Produces(MediaType.APPLICATION_JSON)
  public final Thing getThing(@PathParam("thingId") final int thingId) {
    return dataLayer.fetchThingById(thingId);
  }
}

then consuming the service is joyfully simple (note that this is a slightly fudged about example, and in reality more sophisticated construction of the Client instances would be recommended)

public final class ThingClient {
  private final transient Client client;
  private final transient WebTarget baseTarget;

  public ThingClient(final String serviceUrl) {
    ClientConfig cc = new ClientConfig().register(new JacksonFeature());
    client = ClientBuilder.newClient(cc);
    baseTarget = client.target(serviceUrl);
  }

  public final Thing getThing(final int thingId) {
    return WebTarget target = baseTarget.path("thing")
    .path(Integer.toString(thingId))
    .request(MediaType.APPLICATION_JSON)
    .get()
    .readEntity(Thing.class);
  }
}

Of course, it’s very likely your service will have a bunch of different end points, so you’ll want to pull some of the repeated boiler plate out into separate methods, perhaps something like this (where goodStatus checks that we’ve got some 2xx response from the server, and parseResponse constructs a suitable exception to throw if we got an error response):

public final class ThingClient {
  private final transient Client client;
  private final transient WebTarget baseTarget;

  public ThingClient(final String serviceUrl) {
    ClientConfig cc = new ClientConfig().register(new JacksonFeature());
    client = ClientBuilder.newClient(cc);
    baseTarget = client.target(serviceUrl);
  }

  public final Thing getThing(final int thingId) {
    WebTarget target = baseTarget
    .path("thing")
    .path(Integer.toString(thingId));
    return fetchObjectFromTarget(Thing.class, target);
  }

  public final OtherThing getOtherThing(final int otherId) {
    WebTarget target = baseTarget
    .path("otherThing")
    .path(Integer.toString(otherId));
    return fetchObjectFromTarget(OtherThing.class, target);
  }

  private <T> T fetchObjectFromTarget(final Class<T> returnType, final WebTarget target) {
    Response response = fetchResponse(resourceWebTarget);
    if (goodStatus(response)) {
      return response.readEntity(returnType);
    } else {
      throw parseResponse(response);
    }
  }

  private Response fetchResponse(final WebTarget target) {
    return target.request(MediaType.APPLICATION_JSON).get();
  }
}

This allows us to have a nice consonance between the client and the server, and you can even muck about and ensure the two are kept in line by deriving them from the same interface or base classes.

The one annoyance in this picture is really a matter of documentation. How do you consume a collection of objects?

Declaring the collection service is equally trivial

@Path("/things")
public final class ThingService {
  @GET
  @Path("/thing")
  @Produces(MediaType.APPLICATION_JSON)
  public final Thing getThing(@PathParam("thingId") final int thingId) {
    return dataLayer.fetchThingById(thingId);
  }

  @GET
  @Path("/all")
  @Produces(MediaType.APPLICATION_JSON)
  public final List<Thing> getAllThings() {
    return dataLayer.fetchThings();
  }
}

however the Jersey documentation is… opaque… when it comes to consuming this on the client side. It turns out that this is where the GenericType comes into play (at line 26)

public final class ThingClient {
  private final transient Client client;
  private final transient WebTarget baseTarget;

  public ThingClient(final String serviceUrl) {
    ClientConfig cc = new ClientConfig().register(new JacksonFeature());
    client = ClientBuilder.newClient(cc);
    baseTarget = client.target(serviceUrl);
  }

  public final Thing getThing(final int thingId) {
    WebTarget target = baseTarget.path("thing").path(Integer.toString(thingId));
    return fetchObjectFromTarget(Thing.class, target);
  }

  public final List<Thing> getThings() {
    WebTarget target = baseTarget.path("all");
    return fetchListFromTarget(Thing.class, target);
  }

  private <T> List<T> fetchListFromTarget(final Class<T> returnType, final WebTarget target)  {
    Response response = fetchResponse(resourceWebTarget);
    if (goodStatus(response)) {
      return response.readEntity(new GenericType<List<T>>() {});
    } else {
      throw parseResponse(response);
    }
  }

  private <T> T fetchObjectFromTarget(final Class<T> returnType,
    final WebTarget target) {
    Response response = fetchResponse(resourceWebTarget);
    if (goodStatus(response)) {
      return response.readEntity(returnType);
    } else {
      throw parseResponse(response);
    }
  }

  private Response fetchResponse(final WebTarget target) {
    return target.request(MediaType.APPLICATION_JSON).get();
  }
}

The documentation for GenericType is not great, but essentially it indicates an automagic wrap and unwrap of the collection.

(By the way, a tip of the hat to John Yeary for identifying this solution a few years ago).

DynamoDB Local, Maven and JUnit

or, “how I fell into a deep morass of version discrepancies”

Something I am working on at the moment is rolling in use of DynamoDB Local for integration tests. Now one thing I’ve noticed is that Amazon aren’t really drinking the Maven kool-aid. This generally isn’t a huge problem, as mostly it’s just a matter of getting artifacts into our repository and setting them up as dependencies. DynamoDB Local is a bit different though.

At it’s heart, DynamoDB Local is a wrapper around an SQLite instance, with the DynamoDB API bolted on in front of it. This is a good thing, as it means that by using DynamoDB Local instead of some other mocking framework (specifically Alternator) is a better guarantee that the mocked target is behaving in the way that the real DynamoDB will. Don’t get me wrong, Alternator is a good solution, and has nice light-weight semantics that should serve pretty well anyone, but I did find at least one test that was passing with Alternator that would not pass when pointed at DynamoDB itself. Fortunately it was not in code that was in production use yet, but…

The difficulty with DynamoDB Local is that, as a wrapper around SQLite, it needs to be running while the tests are running, which is not entirely simple in Maven. Fortunately someone has already rolled out a nice little Maven plugin to take care of starting and stopping the instance (a big thank you to Yegor Bugayenko for some nice work). Getting this working from the project documentation was fairly straightforward, however the intention of that documentation was to run DynamoDB Local for integration, rather than unit tests. That’s fair enough, but I’m trying to reduce the number of integration tests in favour of unit tests, as that makes them easier to measure and monitor using Jacoco (and to a lesser extent Sonar). To that end, I’ve modified his instructions to get it working for unit testing.

<?xml version="1.0"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>com.XXXX</groupId>
    <artifactId>YYYY</artifactId>
    <version>1.14.2</version>
  </parent>

  <artifactId>ZZZZ</artifactId>
  <version>1.8.0-SNAPSHOT</version>
  <name>ZZZZ</name>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <dynamodblocal.tgz>${com.jcabi:DynamoDBLocal:tgz}</dynamodblocal.tgz>
  </properties>

  <dependencies>
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.6.1</version>
    </dependency>

    <!-- ref http://www.jcabi.com/jcabi-dynamodb-maven-plugin/usage.html -->
    <dependency>
      <groupId>com.jcabi</groupId>
      <artifactId>DynamoDBLocal</artifactId>
      <version>2013-09-12</version>
      <type>tgz</type>
      <scope>test</scope>
    </dependency>
  </dependencies>

  <build>
    <plugins>
      <plugin>
        <groupId>org.jacoco</groupId>
        <artifactId>jacoco-maven-plugin</artifactId>
      </plugin>

      <!-- allocate a port for dynamodblocal -->
      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>build-helper-maven-plugin</artifactId>
        <version>1.8</version>
        <executions>
          <execution>
            <phase>generate-test-resources</phase>
            <goals>
              <goal>reserve-network-port</goal>
            </goals>
            <configuration>
              <portNames>
                <portName>dynamodblocal.port</portName>
              </portNames>
            </configuration>
          </execution>
        </executions>
      </plugin>

      <!-- force the dynamodb TGZ to be obtained -->
      <plugin>
        <artifactId>maven-dependency-plugin</artifactId>
        <executions>
          <execution>
            <goals>
              <goal>properties</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

      <plugin>
        <groupId>com.jcabi</groupId>
        <artifactId>jcabi-dynamodb-maven-plugin</artifactId>
        <version>0.2</version>
        <configuration>
          <port>${dynamodblocal.port}</port>
          <tgz>${dynamodblocal.tgz}</tgz>
        </configuration>
        <executions>
          <!-- start just before unit tests -->
          <execution>
            <id>beforetests</id>
            <phase>generate-test-resources</phase>
            <goals>
              <goal>start</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>

    <pluginManagement>
      <plugins>
        <plugin>
          <!-- make the port available during unit tests -->
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <configuration>
            <systemPropertyVariables>
              <dynamodb.port>${dynamodblocal.port}</dynamodb.port>
            </systemPropertyVariables>
          </configuration>
        </plugin>
      </plugins>
    </pluginManagement>
  </build>

  <profiles>
    <profile>
      <id>DEV</id>
      <activation>
        <activeByDefault>true</activeByDefault>
      </activation>
      <build>
        <plugins>
          <plugin>
            <groupId>com.jcabi</groupId>
            <artifactId>jcabi-dynamodb-maven-plugin</artifactId>
            <version>0.2</version>
            <configuration>
              <port>${dynamodblocal.port}</port>
              <tgz>${dynamodblocal.tgz}</tgz>
            </configuration>
            <executions>
              <!-- stop after unit tests - this requires us to at least do a verify -->
              <execution>
                <id>aftertests</id>
                <phase>package</phase>
                <goals>
                  <goal>stop</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>
    <profile>
      <id>JENKINS</id>
      <build>
        <plugins>
          <plugin>
            <groupId>com.jcabi</groupId>
            <artifactId>jcabi-dynamodb-maven-plugin</artifactId>
            <version>0.2</version>
            <configuration>
              <port>${dynamodblocal.port}</port>
              <tgz>${dynamodblocal.tgz}</tgz>
            </configuration>
            <executions>
              <!-- stop after unit tests - this requires us to at least do a verify -->
              <execution>
                <id>aftertests</id>
                <phase>deploy</phase>
                <goals>
                  <goal>stop</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    </profile>
  </profiles>
</project>

There’s one additional weirdness about the solution above that needs explaining. We are using Jenkins for CI builds, and the target run by CI is slightly different to the target run by developers on their desktop. By mucking about with profiles, I can issue the ‘stop’ directive to halt the DynamoDB Local instance at different points in the lifecycle. Note that if the developer does not run mvn verify (which is the mandated minimum requirement before check-in), then the DynamoDB Local instance keeps running after Maven terminates and needs to be manually clobbered. On Jenkins, I issue the stop directive at a fairly arbitrary point after the integration tests. The Jenkins build never actually fires this ‘stop’ directive, however the way in which Jenkins runs tests means that Jenkins itself clobbers the DynamoDB instance when the build terminates.

Some caveats:

This works for my particular requirements, your mileage may vary;
The plugin throws a warning message and stack trace when ‘stop’ is invoked, this is a known issue, and the instance does stop correctly;
As a result of that issue, a new version of the plugin has been built, and there could be other required changes to the POM;

When Typography Goes Bad

So I’ve just spent a very confused 15 minutes trying to figure out why something that cannot possibly go wrong was breaking. I’m working to get DynamoDBLocal up and running.

It all looks very simple: download the tar ball, unpack it, and execute the Java invocation:java –Djava.library.path=. -jar DynamoDBLocal.jar

Hmm. That’s odd:Error: Could not find or load main class –Djava.library.path=.Looking at the blog comments, a good number of folk have found the same problem, and come up with suspiciously complex solutions. Let’s see what happens if I re-type the command rather than cutting-and-pasting:java -Djava.library.path=. -jar DynamoDBLocal.jar

Success:2013-10-15 10:14:47.024:INFO:oejs.Server:jetty-8.y.z-SNAPSHOT 2013-10-15 10:14:47.220:INFO:oejs.AbstractConnector:Started SelectChannelConnector@0.0.0.0:8000Can you see it? No? I missed it as well: the ‘-‘ in front of the D is an em-dash, not a minus sign (ie ASCII character 45)

Domesticating Talend

We’ve started working with Talend, and specifically with the ‘big data’ point-and-drag IDE. I’m reasonably happy with it, it does pretty well what it says on the box, but the ability to integrate it’s output with our product and approach is not great. The intention of the product appears to be mainly to run the ETL jobs from within the IDE, but there’s an ‘export job’ facility that dumps a ZIP file containing shell and batch scripts, some generated JARs, and all the dependencies, all bundled up for execution from the command line.

The trouble is that our use case does not match up well with this approach – we need to embed the Talend-generated code inside our service, which for us then means getting the generated JARs into our service project using Maven. The nasty bit then is immediately obvious – how do we version and deploy the Talend-generated JAR files?

My first tentative approach is going to be as follows. Step 1 is to use the Talend job export facility to export the ZIP to a standard location with a standard name. Second step is to use Maven with the following pom.xml, and invoke a standard mvn release:prepare release:perform to get a single unified JAR into our maven repository:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <parent>
    <groupId>com.somoglobal</groupId>
    <artifactId>Apptimiser</artifactId>
    <version>1.14.2</version>
    <relativePath/>
  </parent>

  <groupId>com.somoglobal.talend</groupId>
  <artifactId>PostAttribution</artifactId>
  <packaging>jar</packaging>
  <version>1.0.7-SNAPSHOT</version>
  <name>PostAttribution</name>

  <description>
    The Talend PostAttribution project packaged as a jar.
  </description>

  <scm>
    <connection>scm:svn:https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</connection>
    <developerConnection>scm:svn:https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</developerConnection>
    <url>https://svn.somodigital.com/mobfusion/talend/PostAttribution/trunk</url>
  </scm>
  
  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  </properties>
  
  <build>
    <plugins>
      <plugin>
        <artifactId>maven-clean-plugin</artifactId>
        <version>2.5</version>
        <configuration>
          <filesets>
            <fileset>
              <directory>temp/unzip</directory>
              <includes>
                <include>**</include>
              </includes>
              <followSymlinks>false</followSymlinks>
            </fileset>
          </filesets>
        </configuration>
      </plugin>
      
      <plugin>
        <!-- http://evgeny-goldin.com/wiki/Copy-maven-plugin -->
        <groupId>com.github.goldin</groupId>
        <artifactId>copy-maven-plugin</artifactId>
        <version>0.2.5</version>
        <executions>
          <execution>
            <id>obtain-jars</id>
            <phase>prepare-package</phase>
            <goals>
              <goal>copy</goal>
            </goals>
            <configuration>
              <resources>
                <!-- unpack the zip when not doing release -->
                <resource>
                  <runIf>{{ new File( project.basedir, 'temp' ).isDirectory() }}</runIf>
                  <description>Unpacking Talend export</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <file>temp/newExportFolder.zip</file>
                  <zipEntries>
                    <zipEntry>**/*_0_1.jar</zipEntry>
                  </zipEntries>
                  <unpack>true</unpack>
                </resource>
                
                <!-- unpack the zip when not doing release:perform -->
                <resource>
                  <runIf>{{ !(new File( project.basedir, 'temp' ).isDirectory()) }}</runIf>
                  <description>Unpacking Talend export</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <file>../../temp/newExportFolder.zip</file>
                  <zipEntries>
                    <zipEntry>**/*_0_1.jar</zipEntry>
                  </zipEntries>
                  <unpack>true</unpack>
                </resource>
                
                <!-- unpack the jars -->
                <resource>
                  <description>Unpacking jar files</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <directory>${project.build.outputDirectory}</directory>
                  <includes>
                    <include>*.jar</include>
                  </includes>
                  <unpack>true</unpack>
                </resource>
                
                <!-- discard the jars -->
                <resource>
                  <description>cleaning jar files</description>
                  <targetPath>${project.build.outputDirectory}</targetPath>
                  <directory>${project.build.outputDirectory}</directory>
                  <includes>
                    <include>*.jar</include>
                  </includes>
                  <clean>true</clean>
                </resource>
              </resources>
            </configuration>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

Big shout out to Evgeny Goldin for his copy-maven plugin that makes this easier.

Toward a vision of Sustainable Server Programming

For a number of years I’ve been thinking that I should write down some of the ideas I’ve had and lessons I’ve learned over way too many years banging on a keyboard. In my head this has has always been centered around the vague label “sustainable server development”. Let me try to peel away some layers of that, and make the label a little less vague.

We will begin with “server”. A substantial amount of what I’ve written over the past decade or so I would handwave label as “server side”. But what do I mean by that? For me a “server” is a program intended to be mainly headless, running unattended for considerable periods of time, probably on remote hardware, and providing a well defined service. By preference for me a server runs under some form of Unix. Really the landscape has collapsed to three platforms now: some form of Unix, Windows, and the widely variable set of options for software embedded in specialist hardware (although these days, that is very frequently a Unix variant as well). I’ve done a little bit against Windows, and always found the experience frustratingly complex and ultimately unrewarding. Unix was built from the ground up to provide many of the facilities needed for “server side” coding (I’m particularly thinking of a robust security model, abstracted hardware and networking facilities, and sophisticated multi-processing facilities), and provides the coder with access to multiple layers of the stack between her code and the hardware in ways that Windows makes difficult. Bringing that back to a statement: for me a “server” is a headless, service-oriented piece of code running under Unix, and required to be robust, performant and reliable.

So. “sustainable”. Like any coin, this has two sides (I know, technically all coins have at least three faces): “sustainable server” and “sustainable development”. I believe the two are really linked, and hope over a series of articles to illustrate this. When I talk about a “sustainable server”, I mean something that has been built to minimise hassle and surprise for administrators and for code maintainers. When I talk about “sustainable development” I mean approaches that make building and maintaining robust, reliable and performant code a pleasant and simple 9-to-5 job, rather than a heroic nightmare of late nights, pizza and caffeine.

I am not a fan of heroic coding. There is plenty of verified clinical evidence that amply demonstrates that a tired and stressed coder is a bad coder: some clinical studies suggest that a few days disturbed sleep has the same effects on cognition to being seriously inebriated. We have a culture that is proving very hard to break, where a mad hack over a sleepless week resulting in partially completed, un-documented, un-maintainable code is an effort to be applauded (and repeated) rather than treated as an unwelcome and undesired exception. While the coder is subject to a variety of lunacies from project managers and product owners, we are our own worst enemies if we keep committing to unhealthy and irrational death marches. A calm and rational approach to developing server side services should make it unnecessary: most of the problems to be solved in this space have been solved before, and we’ve got a lot of historical precedents to call on. Most of the time, none of this has to be considered hard or complex, so please just go take a cold shower and a walk around the block, and calm down.

Let me point to an example outside the coding world. Watch a carpenter, or a blacksmith, at work. There’s no sense of rush or panic or urgency. The craftsman knows how long each part of the process takes, has learned from the past, and is happy to re-use established patterns. She gives herself time to deal with the hard parts of the problem by knocking away the simple parts efficiently. And most relevantly: if a project manager rushes in and says “this needs to be done in half the time”, the response is not “oh, in that case we’d better order pizzas because it will be a long night.”

The key elements of what I would classify as ‘good’ server software are as follows:

1) Clarity of purpose. The software does one thing, provides one well defined service;
2) Predictability. The software behaves in a well defined and documented fashion;
3) Robustness. The server should be resilient and gracefully adapt to environmental changes, cope with malformed or malicious requests, and not collapse under extreme load;
4) Manageability. Administrators should be able to monitor and configure the service with ease, and should be able to programatically manage the service state;
5) Performant. Requests should be responded to within 50ms – 100ms or better under all conditions. In particular performance should not degrade unpredictably under load.

In my experience a lot of coders – and managers of coders – have the idea that setting these goals as a minimum base requirement are unrealistic and expensive. Twaddle and nonsense. Let me point to two exemplars, both available as FOSS and both initially built out by small teams: Varnish and the core Apache server. In both cases, these are not part of the base operating system, they are services run on a server. In both cases, all the goals above are amply met. And in both cases, there is a development and maintenance infrastructure around the code which is palpably sustainable and effective.

Varnish is a particularly fine example. There were no surprises or complexities in installing it and running it. It worked as expected ‘out of the box’ without intensive configuration. It’s very easy to monitor and manage. It does one thing, extremely well, and does it in the fashion described, documented and expected. And most importantly it just runs and runs and runs without intervention or alarm.

Lets make all server software that good, and knock off work at 5pm, enjoy our weekends, take up hobbies and stop these panicked head-long rushes into the night. Our partners, family, waistlines and hearts will thank us for it.

Amazon SWF, aspectj-maven-plugin and JaCoCo

Which could be subtitled as “6+ hours of my life I will never get back”.

I’m leaving this here in case someone else finds it useful. The short story is thus: I’m working on a product using the Amazon Flow SDK, writing in Eclipse under Java 7. We use Maven and JaCoCo, and believe in TDD, high levels of code coverage, and code that doesn’t suck. It turns out that getting all these bits to work together is far more complex than it has any reason to be.

There were several conflicting problems: the AspectJ AOP tool was not successfully dealing with the @Asynchronous annotation on my Workflow implementation, which meant that tests were failing. Various attempts to get that working resulted in the aspectj-maven-plugin failing in various horrible ways, JaCoCo instrumentation failing in various horrible ways, or both.

Here’s an edited version of the pom.xml to show I sorted it in the end (which reminds me that I need a better way of adding code snippets here)

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelversion>4.0.0</modelversion>

  <parent>
    <!-- snip: the parent pom contains base definitions for JaCoCo -->
  </parent>

  <artifactid>XXXX</artifactid>
  <version>1.0.0-SNAPSHOT</version>
  <packaging>jar</packaging>

  <name>XXX</name>
  <url>https://...</url>

  <description>
  </description>

  <scm>
    <!-- snip -->
  </scm>

  <properties>
    <!-- snip -->
  </properties>

  <dependencies>
    <!-- snip -->
    <dependency>
      <groupid>org.aspectj</groupid>
      <artifactid>aspectjrt</artifactid>
      <version>1.7.3</version>
    </dependency>

    <dependency>
      <groupid>com.amazonaws</groupid>
      <artifactid>aws-java-sdk-flow-build-tools</artifactid>
      <version>1.5.2</version>
    </dependency>

    <dependency>
      <groupid>com.amazonaws</groupid>
      <artifactid>aws-java-sdk</artifactid>
      <version>1.5.2</version>
    </dependency>

    <dependency>
      <groupid>org.freemarker</groupid>
      <artifactid>freemarker</artifactid>
      <version>2.3.20</version>
    </dependency>
  </dependencies>

  <build>
    <resources>
      <resource>
        <directory>src/main/resources</directory>
        <filtering>true</filtering>
    </resource>
  </resources>

  <plugins>
    <plugin>
      <groupid>org.codehaus.mojo</groupid>
      <artifactid>aspectj-maven-plugin</artifactid>
      <version>1.4</version>
      <configuration>
        <showweaveinfo>true</showweaveinfo>
        <source>1.7</source>
        <target>1.7</target>
        <xlint>ignore</xlint>
        <compliancelevel>1.7</compliancelevel>
        <encoding>UTF-8</encoding>
        <verbose>true</verbose>
        <aspectlibraries>
          <aspectlibrary>
            <groupid>com.amazonaws</groupid>
            <artifactid>aws-java-sdk</artifactid>
          </aspectlibrary>
        </aspectlibraries>
        <sources>
          <basedir>src/main/java</basedir>
          <includes>
            <include>com/xxx/yyy/workflow/*.java</include>
            <include>com/xxx/yyy/workflow/activities/*.java</include>
          </includes>
        </sources>
      </configuration>

      <executions>
        <execution>
          <goals>
            <goal>compile</goal>
            <goal>test-compile</goal>
          </goals>
        </execution>
      </executions>

      <dependencies>
        <dependency>
            <groupid>org.aspectj</groupid>
            <artifactid>aspectjrt</artifactid>
            <version>1.7.3</version>
          </dependency>
          <dependency>
            <groupid>org.aspectj</groupid>
            <artifactid>aspectjtools</artifactid>
            <version>1.7.3</version>
        </dependency>
      </dependencies>
    </plugin>

    <plugin>
      <groupid>org.jacoco</groupid>
      <artifactid>jacoco-maven-plugin</artifactid>
      <executions>
        <execution>
          <id>prepare-agent</id>
          <goals>
            <goal>prepare-agent</goal>
          </goals>
          <configuration>
            <excludes>
              <exclude>**/aspectj/*</exclude>
            </excludes>
          </configuration>
        </execution>
      </executions>
    </plugin>
  </plugins>
  </build>
</project>

The key bits are using the correct versions of AspectJ and the maven plugin, correctly specifying Java 1.7 everywhere possible, and then telling the JaCoCo plugin which classes to exclude when attempting to instrument. This solution is not perfect, and I don’t expect it to be the final solution, as it results in JaCoCo cheerfully reporting that classes generated by the Flow SDK annotation processor have no coverage, but it’s better than a poke in the eye with a decaying ferret.