if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); } Otaku, Cedric's weblog: April 2005 Archives

April 29, 2005

The definitive Python rant

Unsurprisingly, there have been quite a few reactions to my small Python rant earlier.  It's good to see people stand up for their language, that's what makes our profession so unique.

Here are a few comments and my reactions to them:

Complete object orientation was built in from the start

That's not true, see below for more on Python's history.

It's true that it allows you to mix object-oriented and script-style code but I really don't see what's wrong with that, since you get the best of both worlds.

Not always.  I think languages such as Ruby and Groovy definitely enable both types of programming and offer the best of both worlds, but Python has made a lot of compromises when adopting various styles of programming, and my experience with Python has been less about choosing between two great alternatives than picking the least of two evils.

By the way your code has 4 "it" and I'll let you count the different semantics to yourself.

I already did:  there are two.  One to declare the parameter and the other one to use it inside the closure.  And that's how it should be.

I think it's just a matter of what you're used to. For me as a Python and Java programmer, the Ruby code you show looks just as unnatural and weird as the Python code looks to you.  But maybe that's just because I'm not used to seeing Ruby, just as you're not used to seeing Python code.

Fair enough.  For the record, I read a couple of Python books and I do see a decent amount of Python code every day, so I'm certainly used to reading it (not as much writing it, though).  But my dislike for Python is caused by more than the odd for syntax I commented on in my previous entry, but let me elaborate on that.

C offers a for loop similar to Python's (a bit less powerful actually) which is so flexible that pretty much any language that appeared after C provided a similar construct.  Having said that, I still think that C is more elegant than Python because it's more consistent.

The problem with Python is that it came out at a time that turned out to be pivotal in software history.  The software world was slowly realizing the power of several programming paradigms (imperative, functional, declarative, etc...) and set out to explore them all through different languages (C++, Modula, Eiffel, Haskell, Prolog, etc...).  Python started as a very basic script language that mimicked the already bare-bone syntax of C while doing away with most of its type safety (a layer was pretty thin to start with).

Even though object-oriented concepts were slowly emerging as a must-have for industrial software programming, the idea of including them in Python made as much sense as it would to add templates to Ruby, so creating Python without any of these advanced features made perfect sense (Python didn't even have support for static initially!).

But things changed and some of these advanced features turned out to be not only necessary but essential for any language that pretended to be of industrial caliber.  So Python tried to adapt and started to incorporate a mishmash of features from various origins.

Unfortunately, Python wasn't designed to grow.  It didn't follow the recipes laid out by Guy Steele's seminal paper Growing a language, and as such, the inclusion of all these features took a heavy toll on Python's syntax (sometimes acceptable) and Python's consistency (much worse).

For example, static was retrofitted in Python and must now be achieved like this:

def say_hello(cls):
  print X.message
say_hello = classmethod(say_hello)
That's right:  not only do you make a method static by invoking a magic function on it, you actually need to reassign the method to the value returned by this magic method.

I'm sure there are very good technical reasons for this kind of wart, but that's exactly the problem I have with Python:  these reasons were obviously motivated by the complexity that Guido van Rossum had to face in order to incorporate these new features, and not really aimed at making the language simpler for Python users.  I can't imagine there would be any other reason (readability?) than Guido having a problem adding a keyword to his language, or coming up with another less awkward syntax (Ruby does it in a creative way, but I still prefer the static keyword approach).

And this was just to add static.  Retrofitting more complex object-oriented language features to Python has been even more problematic, and the ubiquitous and useless self keyword is just the tip of the iceberg (take a look at how Python implements private/protected or how accessors work).

The same can be said about Python's mixed support for features coming from the functional world.  Generators, lambdas, closures, continuations, etc...  are all implemented with strange restrictions that make guessing the right behavior or syntax almost impossible (to such an extent that some of these features are actually being considered for removal, which is probably the worst thing you can do to a language).

If you are interested in more details about my feelings about Python, here are a few older entries I wrote on this topic.

Anyway.

At the end of the day, elegance is something that just cannot be argued because everyone has different criteria to define it.  Throughout the years and after studying quite a few languages, I have reached a point where languages need to possess a certain set of features in terms of syntax and semantics to get my attention.  Over and over again, I have tried hard to like Python because, frankly speaking, its momentum is undeniable.  But it just never clicked, while my attraction for languages such as Lisp, Java, Ruby and more recently, Groovy, happened within a matter of hours of tinkering.

Wherever your preferences lie, keep your mind open and learn at least a new language every year.  It will make you a better developer.

Posted by cedric at 09:50 AM | Comments (18)

April 26, 2005

Python keeps rubbing me the wrong way

In response to Luke Hutteman's post on continuations, someone offered the following Python snippet to illustrate filtering:

values = range(10)
for nr in (nr for nr in values if nr%2==0):
    print nr
There are several things that bother me with this code:

  • I count four different meanings for the variable nr, each with a different semantics. You should not need more than two (one as a parameter to for and the other one used inside the body)
  • It mixes procedural, and inverted style and doesn't contain anything that is object-oriented.  Python does support some object orientation, so for more complex pieces of code, you will find yourself constantly mixing three different styles of programming.
  • It mixes filtering (nr % 2 == 0) and business logic (print), making it hard to parameterize either.

Here is a Ruby way of doing the same thing:

(0..10)
  .find_all { |it|
    it %2 == 0
  }
  .each { |it|
    puts it;
  }

What I like about it:

  • It's object-oriented, which makes it regular:  each object is applied a message and the result is then piped into the next treatment.
  • Filtering and business logic are clearly separated, so you can improve this example by substituting blocks (strategy pattern)
  • It minimizes the number of intermediate variables (only one:  it and it is only used as a parameter and inside the body of the closure.

The latter solution feels much more natural and intuitive to me, whereas most of my time in Python is typically spent

  • Filtering out the omnipresent and totally useless keyword self.
  • Wondering if the object I am staring at responds to a method or if I need to call a global function on it (such as range in the example above).

I guess I'm too young for Python and spoiled by modern languages :-)

Posted by cedric at 03:22 PM | Comments (38)

April 25, 2005

More on continuations

Don Box offered a few answers to my questions in his latest post, but I am still unsatisfied.  Here's why.

First of all, the various comments on our respective blog entries taught me that C# doesn't really support continuations, which explains the awkward return yield syntax.  This particular C# feature is probably closer to generators, which are themselves a more restricted form of closures (but still more useful than anything we have in Java).

Don's answer to the question "how useful are continuations?" is the following:

I've been programming in C# 2.0 for over a year now. I regularly find myself using the following two pre-defined delegates from mscorlib in my programs:
namespace System {
  public delegate bool Predicate<T>(T item);
  public delegate void Action<T>(T item);
}

I am quite puzzled by this code snippet which doesn't seem to have much to do with continuations.  Don's remaining text is a pretty convincing argument of the usefulness of delegates in C#, something I am in full agreement with.  Years ago, when Microsoft came up with its own JVM and IDE, my fondness for delegates was immediate.  They appeared to me as a typesafe and reasonably object-oriented way to provide method callbacks.

Back then, I spent hours on emails and discussions trying to convince Java developers around me (including at JavaSoft, and more particularly on the Swing team) that we should have delegates in Java.  But my words fell on deaf ears, creating cascades of flame wars that never accomplished anything.  The problem was, of course, that the reactions were clouded by politics and not driven by pragmatism.  So Java drifted away from delegates and will probably never recover.  This is quite sad and probably the cause for thousands of wasted objects every day as we create and destroy interfaces for every click of a button.

Anyway.

Don shows a pretty good example of predicates and first-order logic that allows him to neatly isolate logic between callers and callees. It's a fairly common idiom that you also find in many places in Java (comparators, file filters, etc...).  If you are not familiar with this kind of trick, I strongly encourage you to read up on the STL (Standard Template Library) which, even though it is written in C++ and makes heavy use of advanced template techniques, contains a lot of very important concepts that you will undoubtedly find useful in your daily Java programming.

So I'm back to my original question:  how useful are continuations, really?  I can't shake off the idea that they are nothing more than a glorified way to do goto.  A cleaner way, sure, since it remembers contexts and frames, but I am still looking for this one example where a continuation-based code is more readable than a loop-based one.

Will anyone take up the challenge?

 

Posted by cedric at 09:55 AM | Comments (15)

April 22, 2005

Are you saying you're lazy?

It's not very often that Scott Adams makes a factual mistake, so the opportunity is too good to pass up.

Here is today's cartoon:

Of course, the number of possible combinations for twenty-five numbers is not 25*25 but 25! (factorial of 25)

A friend pointed out that 625 is the right number if you need to try these combinations in pairs, so I'll let Scott get away easy for this time.

 

Posted by cedric at 07:46 AM | Comments (22)

April 19, 2005

The Return of the AOP Caching Challenge!

An interesting article on caching with Aspect-Oriented Programming was just published on TheServerSide, and while it does a decent job at benchmarking and describing the infrastructure, I have a few issues with some of the aspect-related material it covers.

Here are a few comments:

it's not easy to turn caching on or off dynamically when it's part of your business logic

It should be configurable externally.  You don't need AOP to branch conditionally and disable (or alter) your caching logic at runtime.  Most of the EJB and web containers that I know have been providing this kind of functionality in XML files for quite a while.

it's not easy to turn caching on or off dynamically when it's part of your business logic

True, so it's quite surprising that Srini's own solution still falls in this trap anyway (see below).

The cached data is released from memory (purged), by implementing a pre-defined eviction policy, when the data is no longer needed.

I disagree with the "pre-defined" (sic) part.  Eviction policies should absolutely be configurable at runtime, even more so than caching activation itself.  Adjusting the eviction policy is a big part of fine-tuning and optimizing an application, and you need as much flexibility in terms of strategies (round-robin, last used first, timeouts, evict biggest first, etc...) as possible.

Except for these points, Srini does a good job at framing the overall problem and he makes a convincing case to use AOP for caching.  However, caching with AOP is a very complicated thing to achieve, and a couple of years ago, I offered an AOP caching challenge that turned out to to be much harder to solve than everybody thought initially (including myself).

Srini's pointcut is the following:

List around(String productGroup) : getInterestRates(productGroup) {

The problem with this approach is that it explicitly references a method in the business code.

Not only is this dangerous because you are increasing the coupling in your code (and I'm assuming that refactoring will take care of modifying the aspect, should you decide to rename or modify the getInterestRates() method), but it's actually impossibly to scale.  As the number of methods you want to cache increases, you need to remember to update the pointcut to include the newcomers, and this will clearly fall apart very quickly.

Srini is falling in the same trap as the people who tried to solve the AOP Caching Challenge fell into:  not enough abstraction, too much coupling.

As Srini said himself above, caching is completely independent of domains, and this fact should be reflected in the pointcuts you use.  The above pointcut is not independent from the domain model it applies to.

You should be able to determine a trait that "methods that can be cached" share and use this as your pointcut.  I can think of two ways to solve this problem:

  • Decide that any method that takes a string as a key and returns a value can be cached (potentially dangerous since you could get false positives, but this could be alleviated with naming conventions).
  • Use annotations to indicate when a method can be cached.

I think the annotation-based solution is the best compromise in this case, since it makes you independent of naming conventions and doesn't require any modification of your pointcuts as your code base grows.  Also, the burden on developers is minimal since all they need to remember is to add an annotation whenever a method can be cached.

You can also imagine more annotation schemes that would allow for a better partitioning of your caching:

@Cacheable(category = "datasources")
public DataSource getDataSource(String driverName);

@Cacheable(category = "db.accounts")  // "use the cache for rows in table ACCOUNTS"
public Account findAccount(String customerName);

Jonas and Alex, from AspectWerkz, and Ramnivas Laddad, the author of "AspectJ in action" have published a series of articles on annotation-based AOP with AspectJ/AspectWerkz which I strongly recommend.

Regardless, this is an interesting contribution to the problem of AOP-based caching in general, but it goes to prove -- again -- that even two years later, we still haven't quite figured how to solve this problem optimally.

Posted by cedric at 10:38 AM | Comments (3)

April 18, 2005

Continuations: still not quite convinced

Sam Ruby and Don Box have posted a couple of interesting articles on continuations.  Sam's article is a good explanation of what continuations are for "old-timers" and gives a few examples in various domains.  However, I was more intrigued by Don Box' post because it taught me that C# supports continuations, which I didn't know.

It's quite refreshing to see a language take a few risks and implement innovative features.  Time will tell if these features will find their place in developers toolboxes, but right now, I will take this opportunity to express a few doubts on the concept.

Don gives two examples, one written with continuations and one using anonymous classes (delegates, actually, since C# supports those).  And the first thing I notice is that both examples are about the same size and equally readable to me.  This is not good for continuations, since I'm a firm believer that if you introduce a new feature in a language, it needs to improve at least one aspect of that language (readability, concision, performance, etc...) radically.  If the new feature fails to achieve this goal, you now have two slightly different ways to achieve the same thing, and Perl has already taught us that this leads to the path of madness.

Something else that distresses me about continuations is that they are often illustrated with Fibonacci or Web flow control.  These examples are quickly turning into what logging is to AOP:  the quintessential example that everybody understands but nobody can apply to their day job.

But here is the main problem I have with continuations:  how do you debug them?

Out of curiosity, here is how you could implement Fibonacci in Java.  First as an iterator:

class FibonacciIterator implements Iterator {
  private int m_previous0 = 0;
  private int m_previous1 = 1;
  
  public void remove() {
    // not implemented
  }

  public Object next() {
    int result = m_previous0 + m_previous1;
    m_previous0 = m_previous1;
    m_previous1 = result;
    return result;
  }

  public boolean hasNext() {
    return true;
  }
}

Implementing this as an Iterator makes it possible to use it in the new for loop:

public class FibonacciContinuation implements Iterable {

  public Iterator iterator() {
    return new FibonacciIterator();
  }
}
such as:
  public static void main(String[] argv) {
    int n = 10;
    FibonacciContinuation fib = new FibonacciContinuation(); 
    
    for (Object o : fib) {
      System.out.println(o);
      if (n-- <= 0) break;
    }
  }

which outputs:

1
2
3
5
8
13
21

This example is admittedly a little bit more verbose than Don's, but it's because I wanted to make it fancy, and I contend that it has a big advantage over a continuation-based implementation:  it can be debugged.

Imagine that your Fibonacci code has a bug and starts producing bogus values after iteration 1057.  How do you trace the program there and how do you inspect it, since all the state is implicitly maintained by the JVM (or whatever runtime you are using)?

With the field-based approach to continuations, I get to decide and to define what the state is, so that I can inspect it (and future readers of my code will as well).

Does this mean that continuations are useless?  I wouldn't go that far, but it's clear to me that Fibonacci is not the right way to advocate this feature.  There have to be better examples where maintaining the state explicitly like I did above would be too complex than the alternative (letting the runtime do it for you) while still allowing you to debug easily through it.

I really want to like continuations, can somebody convince me with a good example?

Posted by cedric at 09:55 AM | Comments (50)

April 15, 2005

The Perils of Duck Typing

The idea behind "Duck Typing", which has recently be made popular again by Ruby and other script languages, is to make the concept of types less restrictive.

Consider the following:

public interface ILifeCycle {
  public void onStart();
  public void onStop();
  public void onPause();
}
// ...
public void runObject(ILifeCycle object) {
  object.onStart();
  // ...
  object.onStop();
}

Faced with this kind of construct, some languages decide that the existence and even the name of the interface ILifeCycle is unimportant.  The only thing that really matters is the fact that runObject() needs the methods onStart() and onStop() to exist on the parameter, and that's all.

In short, it boils down to:

public void runObject("any object that responds to the methods onStart and onStop" object) {
 // ...  
}

Late-binding languages are actually even less restrictive than that, since the verification that the object does respond to such methods is not made when the object is passed as a parameter to the method, but on the invocation of the said methods, which explains why parameters to methods are usually not typed all.

In a way that's typical for dynamically typed language, the error will therefore only appear at runtime and only if such code gets run.

First of all, let's get a frequently asked question out of the way:  if two interfaces have the same methods, are they semantically equivalent?  Isn't there a risk to pass an object that is totally wrong for this method, yet will work because it responds to the right methods?

I don't have a clear answer to that, but my experience is that such a thing is very unlikely.  This kind of argument is a bit similar to the fear we all felt in the beginning of Java when we realized that containers are not typed:  ClassCastExceptions end up being much more rare than we all thought.

Duck Typing is a big time saver when you write code, but is it worth it?  Don't you pay this ease of development much later in the development cycle?  Isn't there a risk that you might be shipping code that is broken?

The answer is obviously yes.

The proponents of Duck Typing are usually quick to point out that it should never happen if you write your tests correctly.  This is a fair point, but we all know how hard it is to guarantee that your tests cover 100% of the functional aspects of your application.

Another danger in the Duck Typing approach is that it makes it really hard to see what the contract is between callers and callees.

As you can see in the code above, you need to actually understand the entirety of the method to realize that the parameter passed to the method needs to respond to onStart() and onStop().  But the worst part is:  the code is lying to you!

The method is also relying on onPause(), except that this method is not used in this particular runObject().  But it is used in execute() in a different class.  How would you realize that runObject() and execute() work on objects of the same type?  With Duck Typing, it's extremely hard to tell and it requires a detailed read of the code of these methods.

If you wanted to use runObject() from your own code, you would make the flawed assumption that all your object needs to do is respond to onStart() and onStop(), and chaos will ensue if/when the implementation is upgraded to invoke onPause() as well.  At least, with the typed approach, the contract is obvious and you are guaranteed that it can't be changed from under you (the provider of this interface can't add a method to ILifeCycle without breaking everything, so they will probably provide an ILifeCycle2 interface or something similar to guarantee backward compatibility).

I am all in favor of anything that makes the development process more agile, but if I can ship code that contains errors when these errors could have been caught by the compiler before my code even gets a chance to run, I will seriously consider leveraging this support as much as I can.

Duck Typing is dangerous and should only be used for quick prototyping.  Once you switch to production coding, I strongly encourage everyone to make their code as statically typed as possible.

This is one of the great things in Ruby:   it is late-bound but still statically (strongly) typed.  Not only is the interface approach shown in the first code snippet above fully supported in Ruby, it is actually quite encouraged and it doesn't make your code any less Ruby-ic.

Use Duck Typing for prototyping, but program to interfaces for anything else.

 

Posted by cedric at 10:03 AM | Comments (44)

April 12, 2005

Announcing TestNG 2.3

The TestNG team is happy to announce the availability of TestNG 2.3.

The version is available at http://beust.com/testng as well as the new documentation, which has been considerably improved (highlighted code snippets, detailed DTD, ant task and description of all the new features).

What's new:

  • beforeSuite, afterSuite, beforeTest, afterTest
  • Revamped ant task with haltonfailure and other helpful flags
  • Better stack traces and improved level control for verbosity
  • Better syntax for including and excluding methods in testng.xml
  • Test classes can be invoked on the command line
  • ... and many bug fixes.

For Eclipse users, a new version (1.1.1) of the Eclipse plug-in that includes this new TestNG version is available on the remote update site or for direct download.

Also, TestNG has joined OpenSymphony (big thanks to Patrick and Hani for setting this up).  As a consequence of this move, there is now a TestNG users forum as well as a Wiki and JIRA for issue tracking.

The users mailing-list has been moved to Google Groups and is connected to the forum, so you only need to subscribe to one.

Try it and let us know what you think!

Posted by cedric at 09:55 AM | Comments (2)

April 10, 2005

Class-level injection

My post on dependency injection in tests has generated a lot of very interesting comments and email.

Eugene noted that:

However the huge disadvantage of such approach is that you have to repeat these declarations for each and every test method. From this point it is better to have dependencies as fields, because you declare them only once (less coding and easier to change/refactor).

Very true.  Obviously, both method parameters and fields serve a purpose, but it occurred to me that TestNG didn't really help you with fields.  Since I still have some reluctance to the idea of a container altering my private fields, I tried hard to come up with a solution that would solve both problems, and it occurred to me that a natural extension to TestNG would be to allow parameters at the class level.  These parameters would then be passed in the constructor when TestNG is creating an instance of your test class:

@Test(parameters = { "xml-file" })
  public class Test1 {
 
    public Test1(String xmlFile) {
      // ...
    }
}

This mechanism is already in place for test methods, so TestNG users are already familiar with it.  With this construct, the developer is then free to do whatever they want with the parameters passed in the constructor, the most likely approach being to store it in a field for later use in your test methods.

This approach addresses Eugene's remark by enabling both "class-level injection" and "test method parameter injection", but a few hours later, Eugene offered further refinement on the mailing-list:

Better approach will be to have these "parameters" (actually dependencies) declared in the constructor. Something like:

@Test
public class Test1 {

  public Test1( @Dependency( "xml-file") String xmlFile) {
    // ...
  }
}

Indeed, this is even better, but the problem here is that I want to keep supporting JDK 1.4 for a little while and QDox doesn't support parameter annotations.  But at some point in the future, this is most likely how this feature will be implemented.

Posted by cedric at 11:08 AM | Comments (1)

April 07, 2005

Dependency injection in tests

I came across this old entry from Ara about dependency injection in tests.

The idea is to define your beans in XML with a framework like Spring and then use his decorator to inject the beans inside your tests.  The problem with this approach can be summed up in three words:  "too much magic".

Ara's solution uses reflection to enumerate the fields in your test class and match them against the name of the bean as declared in your XML file.  Another problem with this approach is that you need to declare this field inside your class whereas only a few methods might need it, but I agree that JUnit doesn't leave you much choice there.

I believe a better solution is simply to pass the resolved bean as a parameter to the test method.

Ara's test case can then simply be rewritten like this:

public void testSomething(UserDAO userDao) throws Exception {
  userDao.createAdmin();
}

The advantages of this approach are:

  • No more reflection magic and mysterious naming.
  • userDao is scoped to the method that uses it, which makes for better isolation.
  • Uses the standard Java way to pass parameters.
  • No need to declare it as a field.

Now, how do we get the testing framework to pass this parameter to the method?

It's pretty easy to do with TestNG, but as of today, passing parameters is limited to primitive types (no XML bean support such as in Spring), so TestNG only solves half of this problem.

In the future, I am definitely considering adding support for Spring's bean factory so that the limitation to primitive types can be entirely lifted.  Then we could have:

@Test(parameters = { "user-dao" })
public void testSomething(UserDAO userDao) throws Exception {
  userDao.createAdmin();
}

and in testng.xml:

<parameter name="user-dao" spring-bean-name="user-dao-bean">

The good thing about this approach is that it leverages a well-known and robust framework, but we now have two indirections (one Java file and two XML file), so another possibility would be to offer bean support in testng.xml itself:

<bean name="user-dao-bean">
    <property name="userName" value="Cedric" />
</bean>
<parameter name="user-dao" bean-name="user-dao-bean">

Whatever solution we eventually support, I think that passing parameters to test methods is a very important feature that has been overlooked for too long.

 

Posted by cedric at 10:25 AM | Comments (8)

April 06, 2005

Troubleshooting JDK5 applications

Josh recently pointed me to this very nice guide, which contains a very in-depth explanation of all the mechanisms available to JDK5 developres to troubleshoot and diagnose failures.  The document is 124 pages long, so I'm not sure if we should be happy or scared, but it's a very interesting read nevertheless.

Posted by cedric at 08:46 AM | Comments (5)

April 04, 2005

More TestNG articles

A couple of TestNG links:

My Spanish has never been that good.

 

Posted by cedric at 10:09 AM | Comments (1)