if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); } Otaku, Cedric's weblog: January 2005 Archives

January 29, 2005

Open letter to James about Groovy

Mike just posted what appears to be a death sentence for the Groovy project, and it was a very sad read for me because while I like Groovy so much, I can't do anything else but agree with his assessment.

First of all, please note that Mike put money where his mouth is:  since his previous rant on Groovy about a month ago, he has been very active on the Groovy mailing-list and he has tried hard to piece together a decent set of documentation for Groovy Classic.  I suppose that his post today is the observation that this effort failed and the confirmation of his worst fears about the future of Groovy.

But you know what, James?  There is still hope.  There is one very simple way you can prove Mike and countless other disillusioned Groovy fans wrong about their fears:

Announce a date by which you will ship Groovy 1.0

It's that simple.  Really.  Everything else will fall into place.

Once you have a ship date, you will start looking at your work on Groovy very differently.  Everything will become a matter of compromises between the importance of the feature and the necessity to hit the deadline.  The roadmap will also appear to you much more clearly, starting backwards:  plan a beta one month before the deadline, an alpha two months before, a few milestones here and there (not indispensable but I've found that milestones keep you honest and give you a good idea of your velocity).  You will also be to make the best use of the various volunteers who offer their help.

Alright James, it's your turn now.  Pick a date.  Any date.

Groovy deserves it.

 

Posted by cedric at 09:00 AM | Comments (9)

January 27, 2005

Why I prefer SAX to parse XML

There are numerous ways to parse XML in Java but they are all based on one of the two technologies:

  • DOM
  • SAX

I'm not going to explain what these two API's do exactly, there are plenty of articles on the subject, but in a nutshell, DOM gives you a tree view of your XML document, which you can then navigate by moving from one node to the other, while SAX is event-driven and will call your code whenever it encounters a tag.

Over the years, I have come to developa strong liking for SAX despite its apparent limitations, and now, it's reached a point where I haven't needed to resort to DOM for a long time, and here is why.

The thing I like most about SAX is that it allows you to ignore all the portions of your XML document that you don't care about, making it not only trivial to only pick the information you are interested in, but also easier to migrate your schema over time, should you decide to do so.

Consider the following XML document:

<person>
  <first-name value="Cedric"</first-name>
  <last-name value="Beust"</last-name>
</person>

Extracting the first and last names is straightforward:

 public void startElement(String uri, String localName, String qName, Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("first-name".equals(qName)) {
      System.out.println("First name:" + name);
    }
    else if ("last-name".equals(qName)) {
      System.out.println("Last name:" + name);
    }
}

Note that the code above is completely ignoring the <person> tag and it focuses exclusively on the content we are interested in.  If we have reached this point in the code (which is defined in a ContentHandler), the parser has probably already verified the validity and well-formedness of your document.

Of course, this code won't work if the same tags appear several times in the document:

<project name="TestNG">
  <members>
    <person>
      <first-name value="Cedric"</first-name>
      <last-name value="Beust"</last-name>
    </person>
    <person>
      <first-name value="Alexandru"</first-name>
      <last-name value="Popescu"</last-name>
    </person>
  </members>
</project>

or, even more tricky, if these tags have different parents:

<project name="TestNG">
  <members>
    <vampire-slayer>
      <first-name value="Buffy"</first-name>
      <last-name value="Sommers"</last-name>
    </vampire-slayer>
    <vampire>
      <first-name value="Angel"</first-name>
      <last-name value="Angelus"</last-name>
    </vampire>
  </members>
</project>

A typical way to solve this is to keep track of the parent tag:

private VampireSlayer m_vampireSlayer = null;
private Vampire m_vampire = null;

 public void startElement(String uri, String localName, String qName, Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("vampire-slayer".equals(qName)) {
      m_vampireSlayer = new VampireSlayer();
    }
    else if ("first-name".equals(qName)) {
      if (null != m_vampireSlayer) {
        m_vampireSlayer.setFirstName(name);
      }
      else if (null != m_vampire) {
        m_vampire.setFirstName(name);
      }
    }
// ...

Don't forget to "pop out the context" when you exit the tag:

 public void endElement(String uri, String localName, String qName)
    throws SAXException
  {
    if("vampire".equals(qName)) {
      // store the vampire somewhere
      m_vampire = null;
    }
    eles if("vampire-slayer".equals(qName)) {
      // store the vampire slayer somewhere, then
      m_vampireSlayer = null;
    }

However, the problem with this approach is that the business logic attached to a certain tag is now scattered in two different places, which makes the code hard to maintain, so I have adopted the following rule:  whenever I need to run code both at the start and at the end of a tag, I move the business logic in a method that takes a boolean indicating if we are opening or closing the tag:

 public void startElement(String uri, String localName, String qName, Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("vampire-slayer".equals(qName)) {
      xmlVampireSlayer(true /* start */);
    }
// ...

 public void endElement(String uri, String localName, String qName)
    throws SAXException
  {
    if("vampire-slayer".equals(qName)) {
      xmlVampireSlayer(false /* start */);
    }
// ...

  /**
   * @param start If true, we are looking at a opening tag (e.g. <foo>),
   * otherwise, we are looking at a closing tag (</foo>)
   */
  private void xmlVampireSlayer(boolean start) {
    if (start) {
      m_vampireSlayer = new VampireSlayer();
    }
    else {
      // store the vampire slayer somewhere, then
      m_vampireSlayer = null;
    }
  }

And now we have the best of both worlds: code that is not only easier to read but also quite robust in the fact of schema changes.

Now, imagine a more complex situation where your XML file can have tags nested six or seven levels deep.  One day, you need to add a new tag.  With DOM, you would have to locate the code that is walking this particular area of the tree, and even with typed tree-based solutions such as XMLBeans, locating and modifying code is never easy.

With SAX, all you need to do is two things:

  • See if the name of this tag is unique within your file (if not, you will need to disambiguate it with the context approach shown above).
  • Implemt the method xmlTagName(boolean start) and gather its treatment inside.

How about you?  Do you prefer DOM over SAX?  Have you encountered situations where DOM was a much better fit than SAX?

Posted by cedric at 06:40 AM | Comments (28)

January 25, 2005

iPods may be hazardous to your health

I cracked a rib this past Sunday.  I was happily snowboarding on a decent surface of snow which suddenly turned into hard-packed bumpy ice.  The mix of shade and sun at this very location didn't leave me any chance.  My snowboard disappeared from under me and I fell flat on the chest.  Pretty hard.

I made the mistake of playing a squash league match the next evening, which aggravated the injury.  By the end of the match, the simplest twisting motion of my upper body left me with a gripping pain that took a few minutes to recede.

Looking back, it occurred to me that the real reason for my injury is probably...  my iPod.

I usually listen to music when I snowboard but only recently did I realize that when people tell you that snowboarding with headsets on might be hazardous, they are simply missing the point.  What's really dangerous is anything that protrudes sharply from your body.

Am I going to stop snowboarding while listening to music?  Hell no, it's just too good.  And the iPod clearly proved that it is up to the task.

But for now, please don't make me laugh.

 

Posted by cedric at 09:50 AM | Comments (4)

January 24, 2005

XML is not human-editable

This post brings up a few interesting comments about XML:

XML seems to me overhyped. it is *just* a container for data structuring it in most cases.

Saying that XML is overhyped is a bit like saying that text files are overhyped.  The thing is, before XML became a standard, we had a flurry of text formats used to contain various external information that programs need (take a look at sendmail.cf or the hundreds of different configuration files used by any UNIX installation for example).  You never knew what to expect if you tried to read or edit one.

At least, XML gave us a suite of tools that make editing and reading such files easier.  It also gave us a wealth of API's in all languages to avoid reinventing your own parser, another Good Thing.

In that same comment, Klaus adds:

Besides that it is in fact *not really* human-editable.

... and this I fully agree with.  This is a message I have been pushing for years but I've had a very hard time convincing people.  My early dislike for XML as an editable format was what prompted me to create EJBGen in the first place, which was one of the very early tools that used annotations to replace XML (EJBGen started in early 2001, and its immediate success showed that I was not the only one having a problem with XML as an editable format).

It is pretty obvious to me now that as soon as you are creating more than just a toy program and that your code needs to store data outside the code, XML is the only sane way to go.

So the question is not really "Why is TestNG using XML?" but "Why is TestNG using an external file to configure its tests?", as opposed to JUnit where this is done in code.

The answer is that I make a clear distinction between the static (the business logic) and the dyamic (what tests are being run) part of your tests.  I believe JUnit makes the mistake of conflating the two, which forces you to recompile your code when you decide to run a different set of tests.

If you picture a team of ten programmers, each of them will want to run a different set of tests as part of their day, and all the time spent recompiling their test suites is a waste of time.  Not mentioning that they are modifying code that they need to remember not to submit to the source control system, since it only runs a subset of all the tests.

Posted by cedric at 08:19 AM | Comments (7)

January 22, 2005

Bunny suicides

Even bunnies get tired of life...

Posted by cedric at 09:03 PM | Comments (5)

January 18, 2005

Outlook finally virus free

I can't begin to describe how happy this exquisite dialog made me feel this morning.  I was synchronizing my Nokia 6620 with Outlook (more on this phone very soon) and I was wondering why the process was taking so long, until I realized that Outlook was waiting for me to grant access to my address book.

It's taken a while, but I have to chalk this up to Microsoft.  Now, if only Windows required a password before installing any new application, I could finally ditch all my anti-junkware programs (and if you are an Apple fan and you want to point out that MacOS already does this, please don't).

Posted by cedric at 06:28 AM | Comments (9)

January 14, 2005

Disposable email addresses

Given the amount of spam I receive every day, I am extremely reluctant giving away my email address to untrusted parties, especially when I'm pretty sure these people should only ever use that email address once (to send me an activation code, for example).  Therefore, I was absolutely delighted when the first disposable email address service appeared (SpamGourmet) and especially when it was followed by two more (Mailinator and DodgeIt).  Here is a quick review of these three services.

  • SpamGourmet allows you to define an email address @spamgourmet.com with the syntax aWord.aNumber.yourUserName.  The word will typically be used to identify the service that will be trying to email you, the number is the number of times emails sent to that address will be delivered until they start bouncing, and your user name is... well, your own identifier.

    This solution works well but is a bit heavy.  First of all, I don't really care how many times this email address works, most of the time, once is enough and the rest is up to the server.  Second, I need to log in with a user name and a password, and again, I don't really see the need.
     
  • Enter Mailinator.  Mailinator takes the concept one step further by offering you passwordless email addresses.  You type in the user name, and you are automatically taken to the mailbox.  Obviously, you should never use this inbox for anything confidential, and if you can live with that, it's certainly a better solution than SpamGourmet.  It suffers from two shortcomings, though:  1) you can't easily bookmark the inbox, you need to go through the front page and then submit your name and 2) there is no easy way to be notified when the expected email has arrived.
     
  • DodgeIt to the rescue!  With its googleish minimalistic interface, dodgeit appealed to me right away, and the fact that it doesn't use any form allows you to bookmark any inbox (example).  But DodgeIt goes further by giving you an RSS feed to the mailbox.  This is the ultimate luxury in disposable addresses, since you obviously don't want to be notified by email.  Pick a carefully selected login name (you don't want to your reader to bother you if others happen to use the same inbox), point your RSS reader to it and voila!, you will never have to reload a browser waiting for an activation code.

Can anybody top DodgeIt?

Update from the comments:There is also ipoo as a good alternative to dodgeit.

 

Posted by cedric at 10:54 PM | Comments (30)

January 13, 2005

iProduct

Jamie Zawinski's take on Apple's latest announcements...

 

Posted by cedric at 07:21 AM | Comments (2)

January 11, 2005

TestNG article on DeveloperWorks

Filippo Diotalevi has just posted an article on TestNG on DeveloperWorks.  It's fairly short but gives a very good overview of the features of the framework.

 

Posted by cedric at 06:28 AM | Comments (1)

January 10, 2005

Massive spam attack

I have just been hit by the nastiest spam attack yet.  When I got up this morning, I found more than nine thousand (9000!) emails in my Inbox.  They all follow the same pattern:

  • They come from a different email address (different domain even).
  • The wording is slightly modified from one email to the other (they are selling medicines).
  • They are sent to a randomly-generated email address to my domain.
  • They only contain three lines, so my spam filter was unfortunately unable to flag them as spam.
  • The web site they point to seems to be randomly generated but it does indeed work and point to a Canadian drug firm:  basdf kjlke.com (space inserted on purpose) which I hope will be shut down by the time you read this.

I am having a hard time believing this kind of flooding is even effective at all, but fortunately, it didn't take me more than ten minutes to clean up my Inbox.  What a waste of time.

Posted by cedric at 05:52 AM | Comments (4)

January 08, 2005

Faith-based snowboarding

View from the cabin this morning

First snowboarding days of the year in Tahoe and... wow, what a weekend.  I don't think I've every seen that much snow in Tahoe.  It's been dumping non-stop since Friday morning and if you ignore the stormy conditions, it's snowboarding bliss up here.  The conditions being what they are, it's the third white-out in a row, which is particularly propitious for what I affectionately call "faith-based snowboarding".

Basically, the sky and the snow merge into one giant white blur and no matter how good your goggles are, you just can't rely on your eyes to identify the terrain as it unfolds in front of you.  All you can do is trust that there will be as much powder ahead of you as there was behind you. 

It works...  most of the time.  When it doesn't, it provides a spectacular wipe-out for the viewers sitting on the chair lift above you (I've even heard applauses on a few occasions) but even so, all you do is crash into two feet of powder.  Could be worse...

Posted by cedric at 04:53 PM | Comments (2)

January 04, 2005

Travelocity stupidity

I was trying to pull the record for my trip last week from Travelocity, which is the site I used to make the reservation.  Since I was receiving a "trip not found" on the Web site when I typed the trip ID, I decided to call them.  After thirty minutes on hold (thank goodness for headsets), I finally reach a Real Human Being and I explain my problem.

"I am trying to pull the record for a trip I made last week but I can't find it."

"That's normal, it's an old trip."

"You don't keep records of past trips?!?"

"We do, but we don't give access to this functionality."

"Why not?!?"

"It's not needed."

"Well, I certainly need it now, and your competitors and other Web sites keep track of past orders for the convenience of their users..."

"... <silence>...  Would you like me to email you a copy of your receipt, sir?"

"Yes, that would be great."  <sigh>

Some companies really don't get it.

 

Posted by cedric at 06:57 AM | Comments (7)