September 16, 2004

Why I love Ruby

zippedFiles =
  Dir.new(dir).entries.sort.reverse.delete_if { |x| ! (x =~ /gz$/) }
'nuff said.

Okay, there is more to say.

First of all, what does this line of code do? It goes through every file in the given directory, sort them in reverse order while excluding any file that doesn't end in ".gz".

This code ported to Java is quite intimidating:

List<String> result = new ArrayList<String>();

File f = new File(directory);
for (String fileName : f.list()) {
  if (fileName.endsWith(".gz")) {
    result.add(fileName);
  }
}

Collections.sort(result, new Comparator<String>() {
  public int compare(String o1, String o2) {
    return o2.compareTo(o1);
  }
   public boolean equals(Object o) {
    return super.equals(o);
  }
});
This code comes from a log analyzer utility that I wrote some time ago. It goes through the Apache log of my Web server and allows me to easily plug-in listeners to collect various kinds of statistics. This utility has provided me with a flexible log analyzer framework into which I have plugged various additional loggers these past months.

Since I hadn't taken a look at this code in a few months, I was quite happy to realize that it passes the "six-month readability test". A language that has never passed this test for me is Perl. Perl might be a powerful language but if you stop using it for six months, you will need a book to reread your own code and a personal trainer just to modify it.

So I was quite happy to understand my old Ruby code right away, even in the most idiomatic sections such as the one I pasted above. The code carries its intent clearly thanks to aptly-name methods and closures are, as usual, as pleasant to use as they are powerful.

There is a problem with my log analyzer, though, which is the reason why I am revisiting it today: it's pretty slow. It takes about five minutes to run through a month of logs, which I find unacceptable. Therefore, I want to port it to a different language.

While I love Ruby, I have to say I like Groovy even more, because it gives me the same flexibility as Ruby with the familiar Java syntax on top of it. However, I have had some bad experiences with the current versions of Groovy and as far as I can tell from the mailing-list, the stability of the compiler still leaves a lot to be desired.

Exit Groovy (for now). So it will probably be Java or C#. I am hoping the poor performance comes from the Ruby interpreter and not from my code, but I will find out soon enough.

Posted by cedric at September 16, 2004 01:10 PM

Comments

String already implements Comparable? Collections.sort(result) should be sufficient?

Posted by: Eoin at September 16, 2004 02:49 PM

The problem is I need to sort them in reverse order...

The STL allowed to address this kind of problem pretty well but I can't think of a way to do this in Java without supplying my own Comparator.

Anyone?

Posted by: Cedric at September 16, 2004 02:55 PM

FWIW, zsh: ls -rd ^*.gz

Not that this helps you very much ...

Posted by: Doug L. at September 16, 2004 03:06 PM

Collections.sort(result, Collections.reverseOrder());

Posted by: Scott at September 16, 2004 03:07 PM

Here is a groovier version:

gzippedFiles = new java.io.File(args[0]).listFiles().toList().sort().reverse().findAll { it =~ "gz$" }

Posted by: Sam Pullara at September 16, 2004 03:23 PM

What is it about stuffing as much code into one line as possible that seems attractive? I've only dabbled in Ruby, but agree that its damn nice (really should have another look at what support is around in IDEs, I dont like typing).

You're also forgetting java's FilenameFilter:


List files = Arrays.asList(new File(directory).listFiles(new FilenameFilter()
{
public boolean accept(File dir, String name)
{
return name.endsWith(".gz");
}
}));
Collections.sort(files, Collections.reverseOrder());


Posted by: Dmitri Colebatch at September 16, 2004 05:49 PM

I meant to add - that the only nice thing you're demonstrating there are closures, which yes - are damn nice (o:

Posted by: Dmitri Colebatch at September 16, 2004 06:05 PM

If you only run this stuff once a month, and it takes 5 minutes, then that's an hour for a whole year. How much time are you going to waste optimizing this? More than an hour, I guess :-)

Posted by: Jonathan O'Connor at September 17, 2004 02:43 AM

Pretty neat ruby and groovy examples.
I would like to include them in the
ant script manual page.

Posted by: Peter Reilly at September 17, 2004 04:04 AM

Ah, good point, Jonathan.

Actually, I run this script every day and it gives me statistics on the past month.

Still not that much of a big deal, but as you know, developers never need a sound reason to rewrite something from scratch :-)

Posted by: Cedric at September 17, 2004 05:25 AM

Peter, you are very welcome to include these examples in the manual. Let me know where to find it, by the way...

Posted by: Cedric at September 17, 2004 05:25 AM

Ta,
the manual page in in cvs:
http://cvs.apache.org/viewcvs.cgi/ant/docs/manual/OptionalTasks/script.html

Posted by: Peter Reilly at September 17, 2004 05:56 AM

FWIW, it would be slightly more efficient to sort after select, like this (groovy syntax):

gzippedFiles = new java.io.File(args[0]).listFiles().toList().findAll { it =~ "pdf" }.sort().reverse()

Scott

Posted by: Scott Ganyo at September 17, 2004 06:07 AM

Bah. Ignore that "pdf". It should've been:

gzippedFiles = new java.io.File(args[0]).listFiles().toList().findAll { it =~ "gz$" }.sort().reverse()

Sigh

Posted by: Scott Ganyo at September 17, 2004 06:09 AM

You might want to check out the Kataba libraries for Java (www.kataba.com). They provide closures, collections for all types, and a simplified I/O model, but they focus on reducing verbosity. They were actually inspired partly by Ruby and Python. Anyway, here's the code:

List_o zippedFiles
=Colls.list(Files.nameMatches(Files.listDir(dir),"gz$")).sort().reverse();

You'd need a couple of imports:

import com.kataba.io.*;
import com.kataba.coll.*;

I actually used it to write a log analyzer, so I know it works well for that. :)

-Chris (author of Kataba Dynamics)

Posted by: Chris Thiessen at September 17, 2004 07:26 AM

Even simpler (and more performant) than " Collections.sort(files, Collections.reverseOrder());" :

Collections.reverse(result);

Posted by: Luke Hutteman at September 17, 2004 09:06 AM

The Python version is pretty much a one-liner too:

files = filter(lambda s: re.match("gz$", s), sorted(os.listdir(dir))[::-1])

Posted by: Jonas Galvez at September 17, 2004 10:36 AM

Err, that should re.search.

Posted by: Jonas Galvez at September 17, 2004 10:40 AM

A shorter way in ruby:

zipped = Dir.entries(dir).sort.reverse.grep(/gz$/)

Posted by: Joel VanderWerf at September 17, 2004 12:16 PM

Shorter Ruby way:

Dir.new(dir).entries.grep(/gz$/).sort.reverse

Posted by: Martin DeMello at September 17, 2004 12:21 PM

Is the sort really necessary? Aren't they already going to be returned in sorted order by default, ala 'ls'?

Also, as Joel indicates, Dir.new(dir).entries can be reduced to Dir.entries(dir), unless you really want a Dir object. You don't in your example.

Posted by: Daniel Berger at September 17, 2004 12:32 PM

Why not just:
Dir[dir + '*.gz'].sort.reverse

Posted by: Bill Guindon at September 17, 2004 02:07 PM

Here's another Python version:

files = glob.glob("*.gz")
files.sort()
files.reverse()

In Python 2.4, it could be:

files = sorted(reversed(glob.glob("*.gz")))

Pretty damn simple, I say :)

Posted by: Jonas Galvez at September 17, 2004 03:03 PM

Dir.glob("*.gz").sort.reverse

Posted by: botp at September 17, 2004 07:11 PM

Dir.glob("*.rb").sort.reverse

#if it looks and acts like ruby, then it is ruby

Posted by: botp isbotp at September 17, 2004 07:17 PM

Dir['*.gz'].sort.reverse

Posted by: Thien at September 17, 2004 09:08 PM

Another solution that saves 1 intermediate array:
dir = ... # the dir you want to explore
Dir[File.join( dir, "*.gz" )].sort {|a,b| ba}

Posted by: Robert at September 19, 2004 04:48 AM

Here's my Java version:

Collections.reverse(Arrays.asList(new File(dir).listFiles(new FileFilter(){public boolean accept(File f){return f.getName().matches(".*\\.gz$");}})));

;-)

Posted by: Moritz Petersen at September 20, 2004 04:52 AM

Here is a slightly more efficient java version:

Arrays.sort(new File(dir).list(new FilenameFilter() {public boolean accept(File f, String n) {return n.endsWith(".gz");}}), Collections.reverseOrder());

Posted by: Sam Pullara at September 22, 2004 11:00 AM

Just for fun, here are some benchmarks on a 659 file directory:

Sam's Java Version: 2.7 ms per list
Sam's Groovy Version: 15.8 ms per list
Moritz Petersen's Java Version: 9.6 ms per list
Cedric's Java Version: 3.1 ms per list

I'd love to test the other ones on my machine (powermac g4 1.4/1.4) if you send me an easy to run benchmark that measures them.

Posted by: Sam Pullara at September 22, 2004 11:31 AM

Damn. I lost ;-)

Posted by: Moritz Petersen at October 18, 2004 11:57 PM
Post a comment






Remember personal info?