By an interesting timing twist, Vincent just posted a

lengthy entry about build systems
where he explains how to set up an
infrastructure where builds never break:

The general principle is to catch the commit data before they get committed
to the SCM, to perform a build and to perform the actual commit only if the
build is successful.

While the idea is interesting, I believe it won’t work for technical and
philosophical reasons.  Let’s go through the technical reasons first:

The committed data are intercepted using a pre-commit hook script (all modern
SCM support this). This script is in charge of doing 2 things:

  • Finding out the list of projects to be built.

From my experience, this is just plain impossible as soon as your project
becomes moderately big.  This problem is almost impossible to solve for
disparate code bases (mix of Java, C++, Python, you name it) but it’s actually
even surprisingly difficult to achieve even on a 100% pure Java code base. 
In the past, I have even been confronted to build systems that forced you to
enumerate every single file involved in your build instead of using directories
or wildcards (a technique that has pros and cons but which I tend to like) and
even then, we were never able to determine with 100% accuracy all the
dependencies involved, resulting in mysterious "symbols not found" errors and
causing new hires endless hours of frustration and more senior people a lot of
wasted hours trying to explain and diagnose these errors.

In the end, we always ended up recommending doing a complete "clean" to solve
these problems, which worked 99% of the time but still failed once in a while
for reasons we never figured out (at which case, the only final solution is to
recreate a client from scratch).

  • We need build machines to perform the actual build.
  • Unfortunately, this problem cannot be solved by simply throwing more machines
    into the pool.  As various builds are in process coming from dozens of
    developers in parallel, these change lists need to be reconciled before the final approval
    is given to one committer.  In other words, multiple build validation is not a process that is easy
    to parallelize.

    When the build is finished (or if an error occurs)

    This is another problem:  the builds can be arbitrarily long, forcing
    developers to either resume their work on a change list that has not been
    approved yet, or create a new client workspace, sync it and start working there. 
    Either way, it’s not the friendliest environment for developers.

    Among the advantages of his approach, Vincent lists:

    Forces atomic commits!

    Which, of course, should be a feature of the underlying SCM, and not of your
    build infrastructure.  Any SCM system that doesn’t feature atomic commits
    (such as CVS) should not be used for large-scale software.  Obviously,
    Vincent has been using CVS too much :-)

    You need to ensure your build is taking as little time as possible. I think
    5-10 minutes should be ok.

    This one really made me laugh.  The only projects on which I have worked
    that built in less than 5-10 minutes are my own open-source projects.  In
    my past employments, build times of one to three hours were more the norm.  And
    that’s just for building:  check-in tests usually run in under ten minutes
    but it’s common for functional tests to take an entire night to run. 
    Vincent’s system clearly does not scale for such numbers and I can’t imagine
    developers being productive if they need to wait for at least an hour before
    their commit receives approval (and they will work around this limitation by
    holding on to their submission for as long as possible, resulting in huge submission lists, thus
    causing a whole new set of problems which I won’t discuss here).

    Despite my skepticism, I find Vincent’s approach interesting because of its
    contrasting philosophy to mine.  Having a good build infrastructure is a very
    difficult task, and he chooses to put the burden on the developers instead of
    the release managers.  I don’t believe this is the right way to approach
    this problem, and the ideas I put forth in
    yesterday’s entry use the opposite
    approach:  I want to make the build infrastructure as transparent for
    developers as possible.  But not more.

    By "not more", I mean that developers still need to be aware that
    breaking builds is not acceptable and that their submission will be automatically
    rolled back if it happens.  The approach I recommend solves this problem not
    by making build breaks impossible, but by making them harmless to as many people
    as possible.