Archive for November, 2009

Hard core multicore with TestNG



I recently implemented a new feature in TestNG that took me down an interesting technical path that ended up mixing graph theory with concurrency.

Here are a few notes.

The problem

TestNG allows you to declare dependencies among your test methods. Here is a simple example:

@Test
public void a1() {}

@Test(dependsOnMethods = "a1")
public void b1() {}

@Test
public void x() {}

@Test
public void y() {}

In this example, b1() will not run until a1() has completed and passed. If a1() fails, b1() will be marked as “Skipped”. For the purpose of these articles, I call both method a1() and b1() “dependent” methods while x() and y() are “free” methods.

Things get more interesting when you want to run these four test methods in parallel. When you specify that these methods should be run in a pool of three threads, TestNG still needs to maintain the ordering of a1() and b1(). The way it accomplishes this is by running all the dependent methods in the same thread, guaranteeing that not only will they not overlap but also that the ordering will be strictly respected.

The current algorithm is therefore simple:

  • Break all the test methods into two categories: free and dependent.
  • Free methods are thrown into the thread pool and executed by the Executor, one method per worker, which guarantees full parallelism.
  • Dependent methods are sorted and run into an executor that contains just one thread.

This has been the scheduling algorithm for more than five years. It works great, but it’s not optimal.

Improvements

Dependent methods are a very popular feature of TestNG, especially in web testing frameworks such as Selenium, where the testing of pages is very dependent on the ordering in which operations are performed on these pages. These tests are typically made of a majority of dependent methods, which means that the current scheduling algorithm makes it very hard to leverage any parallelism at all in these situations.

For example, consider the following example:



Since all four methods are dependent, they will all be running in the same thread, regardless of the thread pool size. An obvious improvement would be to run a1() and b1() in one thread and a2() and b2() in a different thread.

But why not push thing further and see if we can’t just throw all these four methods in the main thread pool and still respect their ordering?

This thought led me to take a closer look at the concurrent primitives availables in the JDK, and more specifically, Executors.

My first question was whether it was possible to add workers to an Executor without necessarily having them ready to run right away, but this idea soon appeared to me as going against the principle of Executors, so I abandoned it.

The other idea was to see if it was possible to start with only a few workers and then add more workers to an Executor as it’s running, which turns out to be legal (or at least, not explicitly prohibited). Looking through the existing material, it seems to me that Executors typically do not modify their own set of workers. They get initialized and then external callers can add workers to them with the execute() method.

At this point, the solution was becoming fairly clear in my mind, but before I get into details, we need to take a closer look at sorting.

Topological sort

In the example shown at the beginning, I said that TestNG was sorting the methods before executing them but I didn’t explain exactly how this was happening. As it turns out, we need a slightly different sorting algorithm than the one you are used to.

Looking back at this first example, it should be obvious that there are more than one correct way to order the methods:

  • a1() b1() x() y()
  • x() a1() b1() y()
  • y() a1() x() b1()

In short, any ordering that executes a1() before b1() is legal. What we are doing here is trying to sort a set of elements that cannot all be compared to each other. In other words, if I pick two random methods “f” and “g” and I ask you to compare them, your answer will be either “f must run before g”, “g must run before f” or “I cannot compare these methods” (for example if I give you a1() and x()).

This is called a Topological sorting. This link will tell you all you need to know about topological sorting, but if you are lazy, suffice to say that there are basically two algorithms to achieve this kind of sort.

Let’s see the execution of a topological sort on a simple example.

The following graphs represent test methods and how they depend on
each other. Methods in green are “free methods”: they don’t depend on any other methods. Arrows represent dependencies and dotted arrows are dependencies that have been satisfied. Finally, grey nodes are methods that have been executed.



First iteration, we have four free methods. These four methods are ready to be run.

Result so far: { a1, a2, x, y }



The four methods have been run and they “freed” two new nodes, b1 and b2, which become eligible for the next wave of executions. Note that while one of d‘s dependencies has been satisfied (a1), d still depends on b1 so it’s not free yet.

Result so far: { a1, a2, x, y, b1, b2 }



b2 and b1 have run and they free three additional methods.



The last methods have run, we’re done.

Final result: { a1, a2, x, y, b1, b2, c1, c2, d }

Again, note that this is not the only valid topological sort for this example: you can reorder certain elements as long as the dependencies are respected. For example, a result that would start with {a2, a1} would be as correct as the one above, which starts with {a1, a2}.

This is a pretty static, academic example. In the case of TestNG, things are a lot more dynamic and the entire running environment needs to be re-evaluated each time a method completes. Another important aspect of this algorithm is that all the free methods need to be added to the thread pool as soon as they are ready to run, which means that the ExecutorService will have workers added to its pool even as it is running other workers.

For example, let’s go back to the following state:



At this stage, we have two methods that get added to the thread pool and run on two different threads: b1 and b2. We can then have two different situations depending on which one completes first:



b1 finishes first and frees both c1 and d.

Or…



b2 finishes first but doesn’t free any new node.

A new kind of Executor

Since the early days, TestNG’s model has always been very dynamic: what methods to run and when is being decided as the test run progresses. One of the improvements I have had on my mind for a while was to create a “Test Plan” early on. A Test Plan means that the engine would look at all the TestNG annotations inside the classes and it would come up with a master execution plan: a representation of all the methods to run, which I can then hand over to a runner that would take care of it.

Understanding the scenario above made me realize that the idea of a “Test Plan” was doomed to fail. Considering the dynamic aspect of TestNG, it’s just plain impossible to look at all the test classes during the initialization and come up with an execution plan, because as we saw above, the order in which the methods are run will change depending on which methods finish first. A Test Plan would basically make TestNG more static, while we need the exact opposite of this: we want to make it even more dynamic than it is right now.

The only way to effectively implement this scenario is basically to reassess the entire execution every time a test method completes. Luckily, Executors allow you to receive a callback each time a worker completes, so this is the perfect place for this. My next question was to wonder whether it was legal to add workers to an Executor when it’s already running (the answer is: yes).

Here is an overview of what the new Executor looks like.

The Executor receives a graph of test methods to run in its constructor and then simply revolves around two methods:

/**
* Create one worker per node and execute them.
*/
private void runNodes(Set<ITestNGMethod> nodes) {
  List<IMethodWorker> runnables = m_factory.createWorkers(m_xmlTest, nodes);
  for (IMethodWorker r : runnables) {
    m_activeRunnables.add(r);
    setStatus(r, Status.RUNNING);
    try {
      execute(r);
    }
    catch(Exception ex) {
      // ...
    }
  }
}

The second part is to reassess the state of the world every time a method completes:

@Override
public void afterExecute(Runnable r, Throwable t) {
  m_activeRunnables.remove(r);
  setStatus(r, Status.FINISHED);
  synchronized(m_graph) {
    if (m_graph.getNodeCount() == m_graph.getNodeCountWithStatus(Status.FINISHED)) {
      shutdown();
    } else {
      Set<ITestNGMethod> freeNodes = m_graph.getFreeNodes();
      runNodes(freeNodes);
    }
  }
}

When a worker finishes, the Executor updates its status in the graph. Then it checks if we have run all the nodes, and if we haven’t, it asks the graph the new list of free nodes and schedules these nodes for running.

Wrapping up

This is basically a description of the new TestNG scheduling engine. I tried to focus on general concepts and I glossed over a few TestNG specific features that made this implementation more complex than I just described, but overall, implementing this new engine turned out to be fairly straightforward thanks to TestNG’s layered architecture.

With this new implementation, TestNG is getting as close to possible to offering maximal parallelism for test running, and a few Selenium users have already reported tremendous gains in test execution (from an hour down to ten minutes).

When I ran the tests with this new engine, very few tests failed and the ones that did were exactly the ones that I wanted to fail (such as on that verified that dependent methods execute in the same thread, which is exactly what this new engine is fixing). Similarly, I added new tests to verify that dependent methods are now sharing the thread pool with free methods, which turned out to be trivial since I already have plenty of support for this kind of testing.

This new engine will be available in TestNG 5.11, which you can beta test here.

Update: I posted a follow up here.

Supporting unit testing natively in programming languages: follow-up

There were quite a few interesting comments on my previous entry about supporting unit tests directly inside a language, so I thought I’d address some of these observations.

Michael suggests to keep the unit tests near the method they test:

@Tests({
@Test(3, 4: 7),
@Test(-2, 0: -2)
})
public int add(int x, int y) { return x + y; }

My main hesitation about this approach is that the production code is now littered with test code, and even if you use an IDE that conveniently lets you fold these tests away, it’s still painful to read such code in non-IDE environments, such as web browsing, code reviews, terminal diff, etc… This is one of the reasons why I suggested to have the unit test section in one place.

There is also the problem that unit tests usually require more than just “parameters in, result out”: very often, you will need to initialize your objects or perform some arbitrary code to set things up.

Dan is worried about pulling test dependencies in your production code:

What about test-only dependencies? Your production code would depend on mock libraries, Hamcrest expressions, etc.

Yes, this is used to be my strongest argument when I was against the idea of mixing unit tests with code, but this objection disappears if you actually embed this functionality in the language. I argued in my post for a new keyword called unittest that would define a brand new section, so I don’t see why this section couldn’t include its own imports as well:

unittest {
import org.testng.Assert;
public void positiveNumbers() {
Sum sum = new Sum();
Assert.assertTrue(sum.add(3,4) == 7);
}
}

This code will only be compiled if the compiler is invoked with the option -unittest, and otherwise, it will be simply ignored.

Dan adds:

the main gotcha is standardizing on how to run unit tests

Exactly, and that’s my point: this aspect is very underestimated and it’s the reason why so much code is written every day without accompanying unit tests. If you move this functionality inside the language, you create a universal framework across all companies and all projects that make testing a very natural part of the coding process in that particular language.

When you receive a piece of code for review, you now expect the accompanying unit test to be part of that file.

A few people chimed in to advocate Maven, but regardless of my feelings toward this tool, Maven doesn’t solve the problem I’m trying to address here since it’s yet another tool that’s external to the language, which means that it won’t always be available wherever that language is used.

A reader called Mikew is using something similar to my suggestion in Java at his workplace. He and his teammates are using inner classes to put test code inside their production code, and it seems to be working for them, despite the downsides of such an approach. As I pointed out, a lot of these downsides would disappear if such a functionality was supported natively.

Python’s Doctest doesn’t seem very scalable to me, especially because it makes you write code in comments and that the scope of the tests that you can write is extremely limited, but I agree with the underlying intention.

Phil Goodwin brought Eiffel into this discussion:

This was an assertion framework, and so did not quite support the whole testing story. Nevertheless, it provides a substantial foundation for embedding test code into source.

Eiffel’s support for testing, and the whole idea of Design by Contract in general, has always struck me as a toy that only works in trivial cases. Eiffel supports preconditions, postconditions and invariants, but in my experience, these concepts only work for foundation and library classes (e.g. containers) but scale very poorly in the real world (good luck coming up with an invariant for a class that’s manipulating database records while managing transactions).

I wrote a lot of code in Eiffel a long time ago and I distinctly remember that writing either of these contracts became plain impossible very quickly.

Having said that, Eiffel is the only language (besides D) that I can think of that supports arbitrary blocks of code to be included in classes while having a special status attached to them (“this is test code”).

Overall, the concept of something similar to a unittest section in a programming language continues to be very appealing to me and I am betting that if a language supporting this feature emerges in the next few years, the overall quality of code written in that language will be significantly higher than what we are used to.

Should programming languages support unit testing natively?

I used to be strongly opposed to this idea but I started changing my mind recently. Here is what happened.

The bad

Production and test code can be integrated at various levels:

  1. Supported by the language.
  2. Not supported by the language but mixing production and test code in the same classes.
  3. Production and test code live in different classes but in the same directory.
  4. Production and test code live in different directories.

I have always thought that options 2) and 3) are a bad idea because they make it hard to read and review the code, they contribute to the creation of huge classes and they negatively impact your build infrastructure (which must now be able to strip out the test code when you want to create a shippable binary). We (Google) feel strongly about these points, so we are strictly enforcing option 4) (although we often put our tests in the same package as the production code).

I think this practice is the most common out there and it works very well.

With that in mind, wouldn’t a language that natively supports unit testing be the worst case scenario?

The epiphany

Not long ago, I reflected on my testing habits for the past ten years, and I made a couple of interesting observations:

  • I feel the need to write tests for the code that I write very often.
  • Just as often, that need is thwarted by environmental constraints, so I end up not writing these tests.

My experience is with large software, large teams and huge code bases. When you work in this kind of environment, it’s very common for the company to have developed its own testing infrastructure. Sure, the code remains code, but how you run it and how you deploy it will vary greatly from company to company (and sometimes even from project to project).

Typically, I code a feature, iterate over it a few times and I reach a point when I’m pretty happy with its shape: it’s looking decent, it gets the job done and while there is obviously more work to be done on it, it’s mature enough that writing tests for it at this point will not be a waste.

The code to write these tests is usually pretty obvious, so I can form it in my head pretty quickly and materialize it in code not long after that. Now I need to find a way to actually run this test and make it part of our bigger testing infrastructure, and this is where things usually get ugly. I typically find myself having to change or update my environment, invoke different tools, pull out various wiki/HTML pages to brush up on what’s required to integrate my tests to the big picture.

The worst part is that I will probably have to relearn everything from scratch when I switch to the next project or the next job. Again, I will write the test (which is pretty easy since it’s the same language I used to write the production code) and I will find myself having to learn a brand new environment to run that test.

The environmental hurdle is not easy to address, but if the language that I am coding in supported unit tests natively, I would probably be much more tempted to write these tests since 1) there is now an obvious location where they should go and 2) it’s very likely that the test infrastructure in place knows how to run these tests that I will be writing.

The main gain here is that the developer and the testing infrastructure now share a common knowledge: the developer knows how to write tests and the infrastructure knows how to access these tests. And since this mechanism is part of the language, it will remain the same regardless of the project or the company.

How do we do it?

So what would a language that natively supports unit tests look like?

I know first hand that writing a test framework is not easy, so it’s important to make sure that the test feature remains reasonably scoped and that it doesn’t impact the language complexity too much. You will notice that throughout this entire article, I make a point of saying “unit test” and not just “test”. As much as TestNG is focused on enabling the entire spectrum of testing, I think it’s important for a language to only support unit testing, or more precisely, to only make it easy to test the compilation unit that the test is found in.

Interestingly, very few modern languages support unit testing, and the only one that I found among the “recent” ones is D (I’m sure commenters will be happy to point me to more languages).

D’s approach is pretty minimal: you can declare a unittest section in your class. This keyword acts as a method and you simply write your tests inside:

//
// D
//
class Sum
{
int add(int x, int y) { return x + y; }
unittest
{
Sum sum = new Sum;
assert(sum.add(3,4) == 7);
assert(sum.add(-2,0) == -2);
}
}

This is as barebones as it can get. The advantage is that the impact on the language itself is minimal, but I’m wondering if I might want to be able to write different unit test methods instead of having just one that contains all my tests. And if we’re going down that path, why not make the unittest keyword be the equivalent of a class instead of just a method?

//
// Pseudo-Java
//
public class Sum {
public int add(int x, int y) { return x + y; }
}
unittest {
public void positiveNumbers() {
Sum sum = new Sum();
assert(sum.add(3,4) == 7);
}
public void negativeNumbers() {
Sum sum = new Sum();
assert(sum.add(-2,0) == -2);
}
}

As I said earlier, I think it’s very important for this feature to remain as simple as possible, so what features from sophisticated testing frameworks should we remove and which ones should we keep?

If we stick with D’s approach, there is probably little we can add, but if we go toward a class keyword, then there are probably two features that I think would be useful:

  • Method setUp/tearDown (which would already be useful in the example above, where we create a new Sum object in both test methods.
  • Exception testing.

At this point, I’m already feeling a bit uncomfortable with the extra complexity, so maybe D’s simplistic approach is where we should draw the line.

What do you think about native support for unit testing in programming languages? And what features would you like to see?

How to "Go Home" on your Verizon Droid (and Android in general)

As you probably know by now, Verizon’s Droid has been officially announced and it will be available in the US on November 6th.

It’s running Android 2.0 (“Eclair”), which is by far the most advanced release we have worked on. And in case you didn’t follow, this is the third official release in about a year (there has actually been one more that was never made official).

One of the features that has received the most coverage so far is our turn-by-turn navigation application, which turns your Droid into a speaking GPS device. While most articles I have read do a good job of covering its basic features, some articles deplore the absence of a “Go Home” function.

Well, actually, this function already exists.

It’s very simple really, all you need to do is create a home shortcut. Here is how you do it:


    



Long press on the Home screen, select Shortcuts and then Directions.




Make sure that “Turn by turn navigation” is checked.
Enter your home address (or any other),
pick a name for your shortcut and press “Save”.


    


Your shortcut is ready, tap it to start navigating.

Happy navigation on your Droid!