February 08, 2006

Distributed TestNG

And I thought na´vely that nobody would notice the mysterious "Distributed TestNG" mention in the TestNG 4.5 change log...  Well, quite a few people took notice and asked for precisions, so here goes.

First of all, the reason why I didn't advertise this feature more is because it's still a work in progress and it's missing at least one feature that I think needs to be implemented before it can be released in beta form (see the bottom of this post).

Here is the email I sent to testng-dev last month when I requested some feedback from the developers.  If you are interested in this topic, please read the original thread and the responses it generated and let us know!

From: CÚdric Beust
Date: Sun, 1 Jan 2006 09:30:29 -0800
Subject: Announcing Distributed TestNG

More and more people have asked me for this feature these past months and I finally found some time to take a serious shot at it, and I was quite surprised to see that it came along very well.  Here is how it works.

Please keep in mind that there are many open issues (listed below), and I'm emailing testng-dev first in order to gather some feedback on these issues before announcing it on testng-users, so a lot of this is still in flux, but the good news is:  the implementation works, and it's now just a matter of packaging it nicely.

Overview

The idea is to be able to distribute your tests across several slave machines in order to accelerate the overall execution time.  In order to achieve this, you can now launch TestNG in "slave" mode on a remote machine by specifying a port:

java org.testng.TestNG -slave 5150
At which point, TestNG will just sit and wait for incoming connections.

Once you have launched all the slaves that you need, you declare them in a properties file:
# hosts.properties
testng.hosts=terra:5150 arkonis:5151
And you launch the "master" version of TestNG by passing it this host file:
java org.testng.TestNG -hostfile hosts.properties testng1.xml testng2.xml ...
The tests will then be dispatched randomly to the various hosts until they have all run.  All the results will be collected and presented in the usual HTML format (with the addition that these results will include the remote host they ran on).

Right now, two dispatch strategies are supported:  per-test or per-suite (the default).

By default, each suite (each testng.xml file) will be sent to a remote host in its entirety.  If you need a finer granularity, you can add the following to your hosts.properties:
# hosts.properties
testng.strategy=test
In this case, TestNG will parse all your testng.xml files, collect all the <test> stanzas and they will be sent individually to each remote host and then returned after they have run.

Great, what do I need to do to use this?

If you want to take advantage of distribution, there is only one little gotcha you need to be aware of:  your tests need to be static-friendly.

Since the slaves run in the same JVM for maximum performance, you need to make sure that the static part of your tests (if any) is correctly initialized.  For example, if you run this test twice in a row remotely:
public class T {
  public static int m_count = 0;

  @Test
  public void f() {
    m_count++;
    assertEquals(1, m_count);
  }
}
... the second time will fail because m_count will have kept its value after the first run.

Instead, you need to move the initialization in a @Configuration method:
public class T {
  public static int m_count = 0;

  @Configuration(beforeSuite = true) // or beforeTest, or after
  public void init() {
    m_count = 0;
  }

  @Test
  public void f() {
    m_count++;
    assertEquals(1, m_count);
  }
}
The current code in CVS implements everything explained above.  You can find an example hosts.properties in test/ and also master.bat and slave.bat that are convenience scripts to launch the various instances of Distributed TestNG.  Please try it and let me know what you think.

Here are some of the open issues that I'd like to get some feedback on:
  • Terminology:  master / slaves?  Any other suggestion?  (not a big fan of these names right now)
     
  • Specify the list of slaves in a .properties file?  In testng.xml?  In another XML file?  Allow for an ISlaveProvider so that hosts can be dynamically added with a plug-in?
     
  • Slaves can only receive one connection right now, but I am thinking of moving to java.nio and allow multiple connections so they can be shared by several developers.  The idea is for an entire team to share the same hosts.properties and so potentially, different masters could hit the same slave.
     
  • Follow-up question:  should multi-connection slaves be sequential or multi-threaded?  Should this be configurable per-slave?
     
  • A master could be restricted to certain slaves.  All masters share the same properties file but only access a subset of them (would minimize slave-thrashing because of developers on the same team running all the tests at the same time).  How could we specify this?  Regular expression?  ISlaveFilter?

The current version of TestNG (4.5) contains a working implementation of these features.  It is still considered work in progress because I haven't implemented classloading on the slaves, so the only way you can load a different test of classes in the slaves (for example after a new check-in) is by killing the process.  A simple strategy could be that when a slave becomes idle for more than a few minutes, it unloads all its classes.

Please let me know what you think and how useful you think this feature would be to you...

 

Posted by cedric at February 8, 2006 12:48 PM
Comments

From two-phase commit terminology, "coordinator" and "participant"?

This is a good idea! Slow test suites discourage testing often.

I think sequential, unless classloaders are used to prevent concurrency issues on statics. I think tests are often written with the unconscious assumption of sequential running.

It strikes me that something similar could be achieved by setting up tests to be run from the command line, and distributing it with conventional grid technology. Those products would handle any resource pool management issues, one would only have to worry about breaking up the request and aggregating the results.

Posted by: Andrew GJ Fung at February 8, 2006 01:37 PM

Look great, but it is unclear how do you marshal classes and other configuration to the remote runners?

If it is not there yet, thn maybe it worth to look at some continous integration server. I believe Continuum has multi-server support since recently, so it could be interesting to hook into it, so all the code and required resource could be transparently sent to the remote server for the execution which is already running anyways.

Posted by: eu at February 8, 2006 02:16 PM

Right now, the only serialization that happens is done by TestNG. The test developer doesn't need to worry about it (no need to make their tests serializable) since only the class names and their results are exchanged between master and slaves.

Posted by: Cedric at February 8, 2006 02:26 PM

As for your other remark, yes, the hard work (making TestNG run remotely) is done, so it should be fairly easy to integrate with any continuous build server.

Posted by: Cedric at February 8, 2006 02:27 PM

Just curious: why did you implement it so that the master has to know its slaves instead of the other way around?

Posted by: Christian at February 8, 2006 02:40 PM

Ah, I *knew* this would come up :-)

No particular reason, I weighed the pros and cons of each and couldn't determine that an approach was blatantly better than the other one, so I just picked one.

The downside of having the clients declare themselves to the master is the risk of creating a "ping storm" on the master, but the advantage of that approach is that the master needs to do a lot less bookkeeping...

I can still be convinced either way... (or even support both models).

Posted by: Cedric at February 8, 2006 02:45 PM

Cedric, are you saying that classes and otherdependencies are not delivered to the slaves? How does that suppose to work for more or less complicated and actively changed code?

Posted by: eu at February 8, 2006 03:18 PM

Only the class names (actually, the testng.xml) is transmitted to the slaves. The slaves then simply load these classes and run the tests in them (it is assumed that they have these classes in their classpath).

Posted by: Cedric at February 8, 2006 03:21 PM

This seems like a great place to support zero configuration. Apple's Bonjour or JINI both solve the detection and discovery problems, which means your users don't have to shuffle IP addresses.

Posted by: Jesse Wilson at February 8, 2006 05:10 PM

"The slaves then simply load these classes and run the tests in them (it is assumed that they have these classes in their classpath)"
That doesn't seem to be very distributed, couldn't you load the classes from the server?

Posted by: Geoff at February 8, 2006 05:57 PM

Yes, eventually, I'd like to be able to distribute classes from the server, but for now, the goal is simply to be able to distribute the CPU load...

Posted by: Cedric at February 8, 2006 06:04 PM

"master / slaves" is just fine and very well established and accepted in the computer science jargon. Granted you live in the US, but do you *really* feel you need to be politically correct on this? :)
Merry... holidays ;)

Posted by: at February 9, 2006 12:35 AM

How about this naming:

"slave" -> "testServer" (since that whatit realy is)
"master" -> "testClient" or "testDirector"

Also, I would use UDP multicast for auto-discovery.
Each test server will register on specific range of multicast group and your test client when you run it will send ONLY one UDP multicast call and will wait for the reply from available test servers...


Posted by: Ruslan Zenin at February 9, 2006 08:03 AM

In addition, your test client might receive statistics from test server (e.g. current CPU load, number of connected clients, number of the tests to be executed on the queue, etc).

Based on this information you can make "smart" load-balancing decisions

Posted by: Ruslan Zenin at February 9, 2006 08:07 AM

I am surprised working in Google you didn't look up the Google File System design before you decided to see if the clients should identify themselves to Master or the other way around.

Another suggestion, explaning the fact that all the test slaves should all have the jar files and JVM installed with correct version dependencies in your documentation would also tremendously help.

;-)

--Deva.

Posted by: Deva at February 9, 2006 11:48 AM

I have a "special" use case : I would like to have the same tests run on several machines, just to tests several OSes and versions.

Exemple 1 : I want to tests the same 1.4.2 compiled classes, but on :
-> W$ / 1.4.2 VM
-> W$ / 1.5 VM
-> Solaris / 1.4.2
-> AIX / 1.4.2

Exemple 2 : In fact, I now run my python tests integrated seamlessly into TestNG... and it would be sssssoooooo great to be able to run the python tests on all the OSes I use (W$ / Solaris / Tru64 / Linux / HP / AIX).

Just my 2 cents...

Posted by: Laurent Ploix at February 9, 2006 02:25 PM

How are changed classes distributed to the "slaves"? Is this handled by TestNG or do you manually have to do it with scp or networked file system or something as such?

If it's handled by TestNG I assume there's some form of remote class loader scheme in place. That makes me slightly scared as class loader juggling usually end in tears.

Posted by: Jon Tirsen at February 9, 2006 02:41 PM

How are changed classes distributed to the "slaves"? Is this handled by TestNG or do you manually have to do it with scp or networked file system or something as such?

If it's handled by TestNG I assume there's some form of remote class loader scheme in place. That makes me slightly scared as class loader juggling usually end in tears.

Posted by: Jon Tirsen at February 9, 2006 02:41 PM

Oh, I missed your last paragraph. Sorry. :-)

As I said, if you implement remote class loading be sure to make it optional.

Posted by: Jon Tirsen at February 9, 2006 02:43 PM

"If it's handled by TestNG I assume there's some form of remote class loader scheme in place. That makes me slightly scared as class loader juggling usually end in tears."

I'm not sure I understand your reasoning. If you use the more manual approach with NFS or SCP or whatever, you'll be in the same hell I think?

You've delivered new classes which you need to get into the slave. How do you get it to load those new classes in favour of the old ones? You can restart the slave JVM of course but it's not entirely desirable - what if someone else is running a test at the time? If you don't want to restart the slave that brutally you'll be into classloader games I suspect?

Posted by: Dan Creswell at February 13, 2006 10:07 AM

A couple things - one, Laurent has a very good idea. Distributed testing can serve two purposes (1. distribute load over many CPUs and 2. to provide testing over a span of many different OS and hardware platforms). It looks like your approach solves 1. easily but I don't think 2 is supported. Since it seems the master is picking and choosing which tests to send to which slaves. You should have a mode that sends ALL tests to ALL slaves.

The second thing - perhaps use multicasting as someone suggested to help alleviate the prior knowledge of which slaves go to which master and/or vice versa. Slaves/masters need only know the addr/port of the multicast. You could use different addr/ports if you have different master/slave configurations, but in the typical case, both master and slave could default to a commonly agreed upon rendevous point and therefore no pre-configuration of IPs/hostnames would be required. The master merely belts out a call "who wants to take this test" and listens for a slave to reply. First one wins, or whatever strategy you want to use. But that initial handshake is all you would need, once a slave calls in, it can send additional info (like what TCP/IP port the master can use to communicate directly with the slave as an example).

Posted by: John at February 24, 2006 10:16 PM

I'd like to second John's comment on the need/value of being able to distribute tests to different platforms to validate software on different OS/hardware platforms. At the least you should include the ability to run ALL tests on all clients, not just divvy the tests up among them. An alternative system is STAF http://staf.sourceforge.net/index.php) but suffers greatly from difficult test development.

Regarding distributing the clients/slaves, this sounds like good application for JXTA (http://jxta.org), the Java P2P system.

Posted by: Steve Nahm at September 7, 2007 07:02 PM

public class T {
public static int m_count = 0;

@Configuration(beforeSuite = true) // or beforeTest, or after
public void init() {
m_count = 0;
}

@Test
public void f() {
m_count++;
assertEquals(1, m_count);
}
}

What to do if I have Singleton which does some system wide settings for a test case. When I run the next test case that static method from my code is not used as it is running on same JVM.

I tries fork="yes" but I think it does not work with testNg. Any ideas?

Posted by: Pankaj Arora at October 12, 2007 05:20 PM

Hello Cedric,

Can you kindly let us know how to use this feature with the latest version of testNG(testng-5.8-jdk15.jar). I tried launching the slave with following command

java -cp "testng-5.8-jdk15.jar" org.testng.TestNG -slave 5150

I get the following error,
[ERROR]:
Fail to initialize slave mode

What is missing here? Do we have to add some more jar files? Can you please respond to this soon? We are trying to decide if testNG can provide some distributed testing feature. Thanks a lot!

VS

Posted by: VS at May 26, 2009 12:53 PM

Hi VS,

Try running it like this:
java -cp "testng-5.8-jdk15.jar" org.testng.TestNG -slave slave.properties

In slave.properties file:
slave.port=5150
verbose=0

-Craig

Posted by: Craig at July 27, 2009 10:20 PM

When running the slave, how can I put verbose option. I would like to see some output on the slave machine as well.

Posted by: Aleksandar at October 13, 2009 04:47 AM
Post a comment






Remember personal info?