Archive for April, 2010

TestNG anniversary

I was updating the TestNG home page to include the information about the recent availability of version 5.12 in the Maven repository (check it out!) when I noticed that I created this page exactly six years and one day ago.

Wow… Six years.

I believe it took me a few months to implement v1.0, so TestNG is actually a bit older than that.

If you had asked me then if I thought I would still be working on TestNG six years later, I would have politely nodded while wondering from what mental institution you escaped.

Big thanks to the TestNG community for accompanying me through this fantastic journey, and looking forward to more awesome testing, next generation style.

Tags:

Improving exceptions

In a recent article called “Bruce Eckel is wrong”, Elliote Harold decided to write a counter-argument to Bruce Eckel’s critique of checked exceptions.

Developers trying to argue that checked exceptions are a “failed experiment” and that we should only use runtime exceptions are missing a fundamental point in error handling theory, here’s why.

How did we get here?

I think the Java exception implementation is fine, but it was initially used by developers who didn’t really understand how it was supposed to work, and because of that, a lot of improper code ended up being committed to the JDK that we now have to live with. Quite a few API’s in the JDK use checked exceptions instead of runtime ones, and for that reason, a lot of programmers ended up developing negative feelings toward the concept, which led to a few absurd extreme claims saying that checked exceptions should never be used, ever.

In reality, the idea of having two different kinds of exceptions makes a lot of sense, as I will show below.

Categorizing the failures

When an error occurs in a program, there are two things I want to know about it: 1) is it expected? and 2) is it recoverable?

Let’s take a closer look:

  Recoverable Not recoverable
Expected (a) (c)
Unexpected X (b)

Let’s discuss each of these in turn:

  • X I crossed this option because I just can’t come up with a scenario in which we can recover from an unexpected exception.
  • (a) Recoverable and Expected. This is a pretty common scenario that is covered by Java’s checked exceptions. A typical example is FileNotFoundException: very often, you either want to let the user know that you couldn’t find the file, or if this is not information the user cares about, you can always log it or ignore it.
  • (b) Unexpected and not recoverable. This scenario is covered by Java’s runtime exceptions. A typical example is NullPointerException. Of course, null pointers are pretty common in Java code, but you know this situation is not expected because if it were, the developer would have guarded against it ("if (p != null)"). Because it’s unexpected, your system is now in an unstable state and it’s likely that your application will either crash or behave erratically, so this situation is not recoverable.

    (c) Expected but not recoverable. Interestingly, I have found very little documentation of this scenario. After all, if the exception is expected, shouldn’t we be able to recover from it? Well, not always. Read on.

Understanding scenario (c) requires realizing that the same exception can sometimes be recovered from and sometimes not, depending on when it happens.

For example, you could imagine a situation where an application needs to open the same file in two different places. The first time it does, not finding this file means asking the user to provide a different file. But once this file has been supplied, the system can rightfully assume that it’s there, so the second time it needs to open this file, it will expect to find it. But what if something happened that made this file disappear in-between these two times? (e.g. hard drive failure). Suddenly, our exception has become unrecoverable.

This seems to lead us down a new path: it appears that we need two different exceptions for FileNotFound: one that is checked, and one that is not.

Toward a solution

If I had a chance to redo Java’s exception mechanisms, there is actually very little I would change. First of all, I believe that we need to keep the checked/unchecked distinction, which maps very well to the real world, as I just demonstrated.

However, I would probably pick different names to make the difference between these two concepts more palatable. The JDK uses the class Error for this but the difference between an Error and an Exception is quite subtle and often misundertood. To make matter worse, the JDK uses the word “Exception” to call both checked and runtime exceptions. No wonder people are confused.

How about Panic, or maybe Bug instead of Error? I know this sounds a bit childish, but it has the merit of being very clear and when someone not familiar with these concepts is writing new Java code, deciding whether they should throw an Exception or a Bug should be easier.

Second, I would provide two versions of the most common exception classes: one that is an Exception and one that is a Bug. For example, we would have both a FileNotFoundException and a FileNotFoundBug:

  • You are asking the user to enter a number and they type in a letter? Your application should throw a NumberFormatException and catch it so the user can fix their error.

  • You are trying to read a preference file that you saved earlier and a number cannot be parsed correctly? It’s likely that the file got corrupted and you probably want to throw a NumberFormatBug

Having said that, I’m not a fan of the idea that the entire exception hierarchy is now going to be duplicated, so maybe we could push this idea a bit further and use Generics to cut down on the number of classes:

  throw new FileNotFound<Bug>();

or

  throw new FileNotFound<Exception>();

Of course, the compiler would have to apply different rules depending on the Generic parameter, which I don’t believe is feasible today, but you get the idea.

Moving things forward

It’s a pity that the debate on exceptions has become so polarized, because I really think that if Java’s dual approach to exception handling is not the best way to support failures in programs, it’s certainly a step in the right direction.

In the future, I’d like to see discussions on this topic focused more on where and when to use checked exceptions and runtime exceptions instead of reading articles telling you that you should always do “this” and never do “that”.

Android vs. iPhone

Links

Tags:

Links

Tags:

Links

Tags:

Git for the nervous developer

I’ve been meaning to put together an article on Git for quite a while but I’ve always ended up postponing it for various reasons, one of which being that there is already so much material describing Git inside and out that I wasn’t quite sure I could add anything to the subject.

After thinking about this a bit more, I decided to go ahead anyway but to give this article a slightly different spin than what you might have read elsewhere. For this reason, you won’t find in this post any elaborate discussion of Git’s branching model, command line switches or graphical tools. I do include at the end of this article a list of references that covers all the technical aspects of Git that I’m glossing over today.

This article is meant for people who are interested in Git, either personally or as a corporation, but wondering what making the jump will really mean and what to expect.

What you won’t find in this post:

  • Fanboy opinions (“Git is so good that it cured my asthma”).
  • Hater opinions (“Git’s command line syntax is so arcane that it will make your brain leak through your ears”).

What you will find in this post is the perspective of someone who:

  • … was forced to switch to Git
  • … has become reasonably comfortable with it (although by no means an expert)
  • … but still remembers the pain it took to get there.

State of the union

It’s hard to argue with the fact that Git has gained a lot of momentum these past years and that it’s slowly eclipsing other more popular Version Control Systems (VCS), such as Subversion or Perforce. Just a few days ago, the Subversion team released their roadmap which promptly generated discussions wondering whether Subversion wasn’t dying (I think it’s a bit exaggerated and the Subversion team is making the right decision by focusing on the non distributed aspect of their VCS).

What to expect

Switching to Git can be very easy or absolutely traumatizing, depending on how you approach it. Individual users will usually have a pleasant experience since they can start small and expand their knowledge from there, but corporations have to take a big jump, retrain entire teams of developers, adjust their tools and their expectations and be ready to give up on a few benefits that they don’t think they can live without.

As a user

This is undoubtedly the best way to start with Git. Start by picking a hosting service, which can be either remote (e.g. Github) or local (your own machine), and from that point on, you can pretty much get by with just a few commands:

  • git add [file] to tell Git to track a new file.
  • git commit -a -m "Commit message" to commit the changes you have made to your local repository.
  • git push to push your changes to the “server”.
  • Additionally, you will probably want to learn about git clone or git pull to download a repository to your machine.

You can go very far with these simple commands, which mimic almost exactly the work flow of a non-distributed VCS. This is the best way to get comfortable with Git until you get bored, at which point you can start exploring some of its advanced features.

As a team

Switching an entire team or organization from a regular VCS to Git is a complex project that will require managing not just the technical complexity but also the users’ expectations. I could write a lot of pages on this topic alone, but for now, I’ll just give you a few quick tips:

  • The transition will work much better if several of your employees are already familiar with Git and can help you evangelize the idea and provide support to reticent users.
  • Be very clear on the features that they will have to learn to live without (e.g. monotony, see below) and emphasize the personal benefits (e.g. branch juggling, see below as well). You won’t get a lot of credit by telling them that they can inspect each other’s repositories, but they will most likely appreciate the fact that moving between several lines of code is trivial.

Distribution

You will find plenty of definitions for what a “Distributed” Version Control System is, but let me just sum it up for you and share a dirty little secret at the same time:

  • A DVCS doesn’t have a central server.
  • I bet that most people using a DVCS are still using a central server.

Confused yet?

Let me rephrase this: I think the DVCS aspect of Git is not its main strength, but it does provide a lot of benefits which are really what get people hooked to Git.

The one thing you need to know when you are using a DVCS such as Git is that whenever you work on a depot on your computer, you are carrying the entire repository with you. I’m simplifying this a little bit, but it’s the general idea. In contrast, centralized VCS will usually allow you to create “views” or “clients” on your local machine, but the only place where the full repository is stored is on the server (and hopefully, its backups).

Carrying a full repository on your machines sounds like a crazy idea but it comes with a lot of benefits (“cheap branching” and “easy offline work” are two benefits that developers will immediately appreciate) and, more importantly, it works, mostly because Git is extremely efficient in the way it stores the representation of that repository (the fact that hard drives are cheap doesn’t hurt, of course).

Since every user carries a full repository on their machine, everyone is a potential server, which means it’s very easy for developer A to “pull” a change from developer B in order to inspect it and then discard it or merge it in their own repository. This sounds very useful, but in practice, I think that most regular Git users will not take advantage of that feature, preferring to push to and pull from a central server. And by “regular Git user”, I mean “everyone except the Linux kernel team”.

Taming the beast

Git’s user interface is probably its greatest weakness. It’s inconsistent, counter-intuitive and it often reappropriates verbs and nouns that have a clear meaning in the VCS world and it makes them mean something else.

I’ve been trying to come up with a lot of ways to rationalize this bizarre and arcane user interface, but in the end, all I can tell you is: “Deal with it”.

I know, it’s sad, but there is really no other way. Here is an analogy that might help, though: to me, learning Git is very similar to learning a foreign language. Natural languages are notoriously hard to learn for adults because their organic growth has resulted in all kinds of inconsistencies and oddities. At the end of the day, the only way to learn a foreign language is to memorize, memorize and memorize. As years go by, practice make memory regurgitation more automatic and the use of that language requires less and less conscious effort, but the learning curve is something that just can’t be avoided.

Git is very similar in this respect. Don’t try to understand why checkout suddenly means something completely different from what you were used to or to explain the various switches of the reset command: memorize the recipes and just apply them. And don’t be shy about asking seasoned Git users, they are used to getting these questions and they will give you instant answers that will save you a lot of time hunting down the documentation.

However, there is hope for those of us who want to do more than regurgitating Git recipes: if you are serious about becoming comfortable with Git, you should definitely spend some time trying to understand its internals. I’m referring to the graphs, objects, commits, heads and branches that form the backbone of Git. As opposed to its ugly user interface, Git’s underlying infrastructure is very solid and extremely well designed. On top of that, it’s actually not that hard to understand, and once you do, a whole new world will open up.

Once you understand these internals, you will be able to solve problems conceptually by simply visualizing the operations that need to be done on that underlying structure (moving the head there, branching here, rebasing this, merging that, etc…). Once you have solved that problem conceptually, applying the awkward Git user interface to actually do the work will look much easier, albeit still not very natural.

The end of monotony

Centralized VCS such as Subversion and Perforce offer a very popular feature: monotonic revisions. Whenever a new change list is committed to the main repository, it receives a unique number that’s guaranteed to be greater than the previous submission. This makes it very easy to compare files and commits chronologically and to determine which one is the most recent.

Unfortunately, you will have to give up this feature with a DVCS. Because of its distributed nature, it’s not possible (or rather, “practical”/”easy”) to offer such a monotonic number. There is no denying that this feels like a huge step backward, especially for large teams.

However, the absence of such a feature reflects the more fundamental reality that these numbers are not always accurate, and that the bigger and more active and branched out a project is, the more meaningless they become. When branch 2.1 receives commit #720 and branch 2.0.4 has #742, what do these numbers actually tell you about these change lists?

The good news is that there are ways to make up for the absence of monotony, and the main difference is that the tagging will be performed by humans, which is what a lot of VCS-based organizations are already doing anyway.

Branch juggling

When you are trying to make the case for Git in front of a skeptical crowd, it’s important to provide anecdotes that developers can relate to. I promised that this post would avoid content that can be easily found elsewhere, but I have to talk a little bit about branching because for me, understanding this was the first time that I actually acknowledged a very tangible advantage of Git over other source control systems.

The daily routine of a software developer typically involves a lot of multitasking. As much as you want to focus your work on “feature A” and you don’t want to be interrupted, there are many reasons why you might be forced to put your work on hold and start working on “bug B” or “feature C” before being able to resume work on “feature A”.

Switching back and forth between tasks has always been painful with regular VCS but Git makes it very smooth. That’s all I’m going to say about this topic, but if you want to find out more, most of the links that I supply at the end of this article will show you how to achieve this goal very early on in the tutorial.

Gerrit

I’d like to conclude this little overview by saying a few words about a tool that we, the Android team, are using a lot for our Git work: Gerrit.

Gerrit is a web based application that allows you to perform code reviews for Git repositories. It’s basically a server that acts as a gatekeeper: developers push their Git changes to it, other developers review them and when these changes are approved, Gerrit pushes them to the main repository on behalf of the original submitter. If you’re curious to see Gerrit in action, here is the Android open source Gerrit.

The reason I mention Gerrit (besides the fact that I contribute to it in my 20%) is that it represents an interesting compromise where a DVCS is used as a centralized VCS. You might wonder if using a DVCS with a central server is not a bit paradoxical, and I can’t argue that it is in a certain way, but I see this as having the best of both worlds.

The Gerrit server acts as a canonical repository from where developers can clone their own repositories but all the facilities offered by a DVCS (direct push/pull between developers) are still possible and fully compatible with this model. For example, you could pull a change from a fellow developer, makes a few additional changes and then push that change to Gerrit so it will be committed to the canonical repository. After that, both developers can pull and merge in order to reconcile their local repository with the central one.

Wrapping up

First of all, I’d like to apologize for the amorphous nature of this post. I tried to structure it in a way that would show some linearity and consistency, but this is basically a big brain dump of points that have been twirling in my head for a while.

Looking back, it took me a while to warm up to Git (quite a while), but now that I’m here, I really enjoy using it. As I hinted throughout this essay, the benefit will probably not be very apparent for very small teams that mostly commit and push (although you will probably find value in branch juggling even if you’re the only one on your project), but as the team and the number of committers and releases grows, you will find that Git provides an undeniable increase in productivity.

Resources

  • Pro Git. This is a free PDF book written by Scott Chacon, a Git expert who happens to be also very good at explaining Git concepts very clearly.
  • The Git community book. Another book, written by the entire Git community, so not as consistent as Chacon’s book but with quite a few useful tips as well
  • Git user’s manual, written by Linus Torvalds and a few other authors.
  • Git magic. A small and digestible overview of the main concepts.
  • Git ready. A web site dedicated to providing Git recipes to solve common problems. A great time killer when you feel like you’d like to learn something about Git but you’re not sure what.
  • Git documentation. A misleading name that might lead you to think this is the authoritative documentation for Git, which it’s not. It’s more like a hub that points to other Git materials.
  • Embrace the Git index. A good overview of one of Git’s core components.
  • Smacking Git around. Another document by Scott Chacon, this time showing advanced tricks.
  • Git from the bottom up. Save this for the end and only if you want to become a Git expert. This book explains how Git stores its objects and dives deep into the Git internals. You don’t really need to know any of this to be a happy Git user.
  • Git manuals. UNIX manuals usually represent the worst possible way to document tools, but the Git manuals are a pleasant exception in this ocean of mediocrity because they do something right: they show you examples for every single command. See for example the manual for checkout (scroll to the bottom).