if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); } Otaku, Cedric's weblog: December 2004 Archives

December 30, 2004

Build philosophies

By an interesting timing twist, Vincent just posted a lengthy entry about build systems where he explains how to set up an infrastructure where builds never break:

The general principle is to catch the commit data before they get committed to the SCM, to perform a build and to perform the actual commit only if the build is successful.

While the idea is interesting, I believe it won't work for technical and philosophical reasons.  Let's go through the technical reasons first:

The committed data are intercepted using a pre-commit hook script (all modern SCM support this). This script is in charge of doing 2 things:
  • Finding out the list of projects to be built.

From my experience, this is just plain impossible as soon as your project becomes moderately big.  This problem is almost impossible to solve for disparate code bases (mix of Java, C++, Python, you name it) but it's actually even surprisingly difficult to achieve even on a 100% pure Java code base.  In the past, I have even been confronted to build systems that forced you to enumerate every single file involved in your build instead of using directories or wildcards (a technique that has pros and cons but which I tend to like) and even then, we were never able to determine with 100% accuracy all the dependencies involved, resulting in mysterious "symbols not found" errors and causing new hires endless hours of frustration and more senior people a lot of wasted hours trying to explain and diagnose these errors.

In the end, we always ended up recommending doing a complete "clean" to solve these problems, which worked 99% of the time but still failed once in a while for reasons we never figured out (at which case, the only final solution is to recreate a client from scratch).

  • We need build machines to perform the actual build.
  • Unfortunately, this problem cannot be solved by simply throwing more machines into the pool.  As various builds are in process coming from dozens of developers in parallel, these change lists need to be reconciled before the final approval is given to one committer.  In other words, multiple build validation is not a process that is easy to parallelize.

    When the build is finished (or if an error occurs)

    This is another problem:  the builds can be arbitrarily long, forcing developers to either resume their work on a change list that has not been approved yet, or create a new client workspace, sync it and start working there.  Either way, it's not the friendliest environment for developers.

    Among the advantages of his approach, Vincent lists:

    Forces atomic commits!

    Which, of course, should be a feature of the underlying SCM, and not of your build infrastructure.  Any SCM system that doesn't feature atomic commits (such as CVS) should not be used for large-scale software.  Obviously, Vincent has been using CVS too much :-)

    You need to ensure your build is taking as little time as possible. I think 5-10 minutes should be ok.

    This one really made me laugh.  The only projects on which I have worked that built in less than 5-10 minutes are my own open-source projects.  In my past employments, build times of one to three hours were more the norm.  And that's just for building:  check-in tests usually run in under ten minutes but it's common for functional tests to take an entire night to run.  Vincent's system clearly does not scale for such numbers and I can't imagine developers being productive if they need to wait for at least an hour before their commit receives approval (and they will work around this limitation by holding on to their submission for as long as possible, resulting in huge submission lists, thus causing a whole new set of problems which I won't discuss here).

    Despite my skepticism, I find Vincent's approach interesting because of its contrasting philosophy to mine.  Having a good build infrastructure is a very difficult task, and he chooses to put the burden on the developers instead of the release managers.  I don't believe this is the right way to approach this problem, and the ideas I put forth in yesterday's entry use the opposite approach:  I want to make the build infrastructure as transparent for developers as possible.  But not more.

    By "not more", I mean that developers still need to be aware that breaking builds is not acceptable and that their submission will be automatically rolled back if it happens.  The approach I recommend solves this problem not by making build breaks impossible, but by making them harmless to as many people as possible.

     

    Posted by cedric at 09:29 AM | Comments (6)

    December 29, 2004

    Developers should never build

    This entry makes an interesting analysis of various ways that a build can break.  As far as build philosophy is concerned, I have a very simple motto:

    Developers should never build.

    Never.  Period.

    Over the past ten years, I have worked at companies that manipulate huge code bases on a daily basis, and their build system was so complex that teams of several people to run it are pretty common.  All these companies are experts at building software, but it's amazing how so few of them really understand how much time is wasted every time a developer needs to build the entire product to get their job done.

    Typically, developers will only be working on a very small fraction of the code base, and they should only have to build this portion and nothing else.  All the classes and external libraries that this code depends on to build successfully should be downloadable in a binary form.

    These "clean" snapshots should be generated by your continuous build system (CruiseControl or similar) and can have several variations.  The two most important types of snapshots in my opinion are:

    • Clean build.  The entire product built successfully but the tests have not been run, so some of them might fail.
       
    • Clean tests.  The entire product built successfully and passed all the required tests.

    Typically, the label for a clean build will advance faster than that of a clean test, therefore providing a more recent view of the product for those developers that need the most up-to-date clean version of the build.  Also, these two top categories can be declined in further subcategories (clean check-in test, clean functional tests, clean partial build, etc...).

    If you manage to set up such an infrastructure, a broken build becomes much less harmful to the entire organization since there are very little instances where a developer absolutely needs to synchronize to HEAD, which is the only change list that can potentially be broken.  If developers only sync to a clean label, they become completely shielded from occasional build breaks.

    That being said, build breaks should be treated with the utmost emergency by release engineers and I am more and more liking the idea that submissions that break the build (and possibly, the tests) should be immediately and automatically rolled back.  It might be a bit harsh, but it makes developers more aware and more careful before submitting their code, because undoing a rollback can sometimes be painful, depending on the source-control system you are using (it's trivial with Perforce, not necessarily so with others).

    Once such an infrastructure is in place, the daily routine of a developer becomes:

    • About once a day, sync to a clean label and download the corresponding binaries.
    • Several times a day, sync only the subset of the project you are interested in if you need the latest bits (a step that's most of the time even optional).

    No more build break syndrome.

     

    Posted by cedric at 07:10 AM | Comments (7)

    December 22, 2004

    More on reusability

    My previous entry has generated a few interesting comments which deserve further thoughts...

    Pascal writes:

    Cedric, while you're basically right, one should still differentiate about COM and CORBA (which is probably what you meant with IDL):

    Actually no, I was really talking about COM when I mentioned IDL.  Both COM and CORBA use IDL to define interfaces, and although Microsoft's version has a few Windows-specific flavors, they are basically identical and serve the same purpose.  And you're right, CORBA fits in category 2/3, and if I didn't mention it, it's simply because it hasn't succeeded as well as COM.

    Thanks for the NetKernel information, I didn't know about it, I'll check it out.

    Jeff wonders:

    Cedric, why do you think that binaries are more reusable than plug-ins?

    I was careful to put these two approaches very close to each other and the only reason why I said that COM is more reusable than a plug-in approach is because it belongs in the operating system.  We have come a long way with OSGi but there is still no real standard plug-in API in Java.  Ideally, I'd like for such a thing to be shipped with J2SE so that not everybody who wants to make their application pluggable needs to download external packages (wouldn't it be nice if the knowledge you acquired to write Eclipse plug-ins could be readily reused to write an IntelliJ or JEdit plug-in?).

    I don't really agree on your definitions of plug-in and binary reusability.
    The main difference I see between plug-ins and binaries is that the plug-in invite you to extend it and enhance its functionalities while you can only use the binaries without affecting its own set of functions.

    This is a subtle point (also made by Rob Meyer later in the comments), but after thinking about it more, I realize you are both right.  Both approaches are a manifestation of code reuse but they are definitely serving different purposes.  However, maybe using the HTML renderer as an illustration of COM's power is misleading, and if you look at Outlook's COM interfaces, you realize that the line between "reusing core components" and "extending an existing core architecture" is indeed quite blurred.

    Interestingly, Reuters just published a very interesting article describing Office as a full-fledged platform.  Not only is this important to Microsoft's business, it's also a very effective way to make sure that open-source alternatives to Office and Windows remain irrelevant regardless of how good they are.

    At any rate, there is clearly a need for more thinking on this topic.

    Justin says:

    Having written an eclipse plugin, I think the two best thing the Eclipse group ever did were to have the bindings specified by XML and most of the application functionality implemented via plugins.

    I couldn't agree more, and I like this approach so much that I based TestNG on it.  A careful mix of XML and sound coding patterns is the ultimate reusability weapon.  Having said that, there are still a lot of pitfalls to avoid (too much XML or too much modularization, reusing in a flexible but overly complex architecture), and I suspect that we will be discovering a lot of new concepts in the next years.

    Keep the comments coming, I am trying to figure this out as much as anybody else!

     

    Posted by cedric at 01:18 AM | Comments (7)

    December 21, 2004

    The four degrees of reusability

    When I write code, I am always trying to think in terms of reusability, and not just for me.  Ideally, I want developers to be able to leverage my work efficiently and with very little effort.

    I have identified four levels of software reusability:

    1. No reusability.  The least interesting of all.  It's typically a standalone application that was designed to serve a finite set of goals and not be extensible by anybody else but its creator.
       
    2. Binary reusability.  While the source is not available, binary reusability enables third parties to write extensions to the software.  Microsoft COM is the best example of such an approach, and it's been tremendously successful.  For example, even though you don't have access to Internet Explorer's HTML renderer, you can still embed it in your applications and take advantage of all its power (Yahoo Messenger uses it but you would probably never tell).
       
    3. Plug-in reusability.  This is very similar to the binary approach mentioned above but with the difference that the operating system is not involved in the connection between the plug-in and the core architecture.  Eclipse is a good example of such an approach, which can also be used in the absence of the source.
       
    4. Source reusability.  The software is shipped with its entire source code, making it possible -- in theory -- for everyone to extend it at will.

    From the least reusable to the most reusable, I would sort these approaches as follows:

    1 < 4 < 3 < 2

    First of all, the fallacy that all it takes to make a software reusable is to open-source it needs to die. You are truly deluded if you think developers will try to understand your thousands of lines of code, and that's a lesson that Netscape and most projects on sourceforge have learned the hard way.

    The reason why options 2 and 3 are the best way to make your software reusable is simple:  instead of overwhelming the developers with your entire code base, you are making an extra effort to identify parts of your software that are truly reusable and you expose them in a readable way both in terms of documentation (Javadoc and similar) and software patterns (interfaces, factories and dependency injection mostly).

    This is exactly what COM+IDL did with the resounding success that we know, and more recently, Eclipse, with its terrific plug-in API based on OSGi.

    While reading up on COM and Eclipse will be very useful if you intend to make your code reusable, you can start with simpler tasks such as using your own plug-in API yourself, inside your core product.  Start with the core services and build on top of it while keeping in mind that in the future, other developers than you might be writing similar code.

    A good hint that you're on the right track is to see a lot of interfaces in your code (and consequently, it is strongly discouraged to use "new" on classes that you intend to expose in your plug-in.

    I'll expand on these techniques in future entries.

    Posted by cedric at 07:41 AM | Comments (9)

    December 15, 2004

    Tim Bray at JavaPolis

    I am currently listening to Tim Bray's keynote, opening the JavaPolis conference.  His presentation is called "When not to program in Java" and is a comparison of C, Java and Python (with a short mention of C++, although he admits having never programmed in it).

    The presentation is overall interesting but not very innovative.  Tim compares the number of lines and characters used to solve a simple problem and draws a number of conclusions.  The one I disagree the most with is his assertion that Python is more object-oriented than Java.  "Almost irritatingly so", he says, arguing that sometimes, Python is "too" object-oriented for his taste.

    I am still scratching my head at this, and it's hard for me to understand how you can tell that Python is very object oriented when:

    • You need to pass "self" as a parameter to all your methods...  Does this remind you anything?  Right, that's how we "simulated" object-orientation in C, by passing the address of the current object in first parameter.  If anything, this shows that Python is not object-oriented, and this flaw is a simple illustration of Python's old age (it was not object-oriented when it was created, more than fifteen years ago).
       
    • You can't pass messages to, say, strings, as you can in Java, Ruby or Groovy ("foo".size()).  Writing code in Python usually ends up being a mix of imperative and object-oriented calls that doesn't always follow any logic.

    Python is a fine language but overall, I am quite surprised by the amount of misconceptions that people have about it, and Tim is certainly not the first one falling under Python's spell.  I chatted with Tim after his talk and he admitted not being a very proficient Python programmer...  Mmmh.

     

     

    Posted by cedric at 02:33 AM | Comments (10)

    December 08, 2004

    JavaPolis 2004

    I am headed out to Antwerp, Belgium next week, to attend JavaPolis.  I will be presenting TestNG Wednesday at 4:55pm and my coworkers, Josh and Neal, will follow with their famous Tiger talk.  In case you have any doubt attending these talks, know that we will be giving away cool freebies and, of course, we are always interested in talking to potential hires, so bring your résumé with you!

     

    Posted by cedric at 09:42 AM | Comments (1)

    December 06, 2004

    Announcing TestNG 2.0!

    I am very happy to announce the availability of TestNG 2.0!

    TestNG is a testing framework that fixes most of JUnit's deficiencies and innovates in a number of ways:  use of annotations, configurable dynamic invocation (no need to recompile), test method groups, dependent methods, external parameters, etc...

    There is only one new feature in this release, but it's quite a big one:  JDK 1.4 support.

    Thanks to the restless efforts of Alexander "Mindstorm" Popescu, TestNG can now be run with JDK 1.4, using the familiar JavaDoc annotations.

    Here is an example:

    import com.beust.testng.annotations.*;
    
    public class SimpleTest {
    
      /**
       * @testng.configuration beforeTestClass = "true"
       */
      public void setUp() {
        // code that will be invoked when this test is instantiated
      }
    
      /**
       * @testng.test groups = "functest"
       */
      public void testItWorks() {
        // your test code
      }
    }

    Using TestNG with JDK 1.4 is straightforward and very similar to how you invoke it with JDK 1.5, which will make future migrations easy:

    • Include the TestNG jdk14 jar file in your classpath
    • Specify a srcdir attribute in the testng ant task, so TestNG can locate your annotated sources

    And that's all!  TestNG works exactly the same regardless of whether you use JavaDoc or JDK 1.5 annotations, and all the regression tests have been updated to make sure both versions cover the exact same features.  For the record, we are using the excellent QDox to parse annotations and Doug Lea's original concurrent utilities to implement parallel test runs.

     

    Posted by cedric at 08:31 AM | Comments (1)

    December 03, 2004

    TiVo and the importance of Now

    In a comment to my TiVo entry, Noah makes the following observation:

    You can circumvent most of the advertising by waiting for the DVD Season box set to be released on Netflix. Sure it puts you behind a few months (years!? :), but you are not only rewarded with no advertisements, but you also get bonus footage surrounding the show.

    What you are dicussing is only a problem plagued by those who value entertainment based on its timeliness.

    This is true but the "timeliness" factor is not just a fashionable whim shown by futile people, it's part of out society.  When a show comes out and the next day at work, you hear everybody talk about it, you want to be part of it.  This feeling goes even to the point where people watch shows they don't really enjoy that much just so they can be part of the discussions and/or controversies that will ensue at the water cooler or at their next dinner party.

    From that perspective, the DVD approach doesn't fly.  It can also be expensive, and you have no guarantee that the said show will ever come out on DVD's, much less that NetFlix will ever include it in their catalogue.

    One type of advertisement that TiVo cannot circumvent is product placement.

    Product placement is annoying but not very disruptive.  I didn't mind too much seeing a huge Budweiser ad on Time Square as Spiderman was swinging on his web threads over Fifth avenue.  I would certainly mind if an animated banner popped every five minutes at the bottom of the screen to convince me to buy a car.  You think it will never happen in theaters?  I'm not so sure...

     

    Posted by cedric at 09:14 AM

    December 01, 2004

    Obscure yet indispensable Windows apps

    We all use some obscure Windows utilities that we couldn't do without every day but that hardly anyone knows...  Here are some of the programs that are installed on every single Windows machine I own:

    Xplorer2 

    The Windows File Explorer is laughably mediocre, starting with the fact that it doesn't even offer a two-view pane (not shown in the screenshot below) and makes it very hard to achieve the simplest things.  Xplorer2 to the rescue.  It's a free file explorer that allows lightning-fast file manipulations, especially if you like keyboard shortcuts.  Every little feature of this program shines with optimization and it's clear that the author is an efficiency nut.  My favorite features are:  hold ALT to execute the action in the other pane (such as ALT-click to open a folder in the other pane) and F8 to create a directory.  It comes with a "quick" HTML help file (which is still several pages long and is a must-read) and a real user manual.  It comes in a "lite" (free) and pro (commercial) version.

    StartupMonitor

    With the increasing threat posed by AdWare and viruses, I have become a bit paranoid about what's happening with my computer.  StartupMonitor is a lightweight program that informs you whenever a program is trying to register itself at start-up and gives you the opportunity to deny the change.  Since this is a relatively obscure utility, it's quite likely that viruses and MalWare won't bother checking for its presence, so it makes me feel safer.  Note that SpyBot S&D contains a similar utility called TeaTimer.

    ProcessExplorer

    Written by the good people at Sys-Internals, this is a souped up task manager that lets you not only inspect which tasks are running and how much resources they consume, it's also more effective that the Task Manager at killing certain tasks, it shows you the launch hierarchy of your tasks and, most importantly, allows you to determine which process locks a certain file (very handy when you are trying to delete such a file and Windows tells you it is being locked by "a" process).

    Finally, I am currently playing with Rock-It Launcher in a quest to find a universal keyboard-based application launcher.  The probleme here is that the most important applications I use have an icon in the QuickLaunch bar, but not all of them, and sometimes, I just forget where they are stored.  I need a launcher that will let me type a few letters belong to this application and will automatically find it and launch it for me.  Rock-It Launcher does a decent job at that but I once used a different program that had a more friendly GUI.  If you have suggestions, please let me know...

     

    Posted by cedric at 09:46 AM | Comments (23)