if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); } Otaku, Cedric's weblog: August 2003 Archives

August 28, 2003

Stylistic issues

Comments on a few stylistic issues I noticed these past days.

Ted talks about "cuddled else's" (what a cute name):

} else {

AKA, the "cuddled else". I never heard of a name for this before. I use this style extensively when I'm programming in Java, but it and all the close brackets use up a lot of whitespace.

This convention is recommended by Sun, but I usually prefer to have the closing brace and the next instruction on separate lines.  Nothing religious or artistic about this preference, the reason is pragmatic:  easier copy and paste.

For example, suppose you need to move the catch block to a different place in your source.  Compare the copy/paste operation between the following two formattings:

try {
  // ...
} catch {
  //...
}

and

try {
  // ...
}
catch {
  //...
}

Cameron posted the following code snippet, that I would also write slightly differently:

public Object[] toArray()
  {
  int cItems = size();
  if (cItems == 0)
  {
    return EMPTY_ARRAY;
  }

  Object[] aoItem = new Object[cItems];
  // ...
  return aoItem;
  }

I have two remarks about this code:

  1. I tend to avoid multiple returns in the same method.  They make it harder to follow the logic behind the flow and it's also inconvenient to break at the end of the method in order to introspect the returned value.
     
  2. I always store my returned values in a variable called "result" (one of the few things I liked about Eiffel).  This way, I can highlight this variable and pinpoint immediately where it is assigned.

Therefore, I would rewrite Cameron's code this way:

public Object[] toArray()
  {
  Object result = EMPTY_ARRAY;
  int cItems = size();
  if (cItems >= 0)
    {
    result = new Object[cItems];
    // ...
    }

  return result;
  }

(Cameron, note that I respectfully respected your insane indentation scheme).

Posted by cedric at 09:41 AM | Comments (16)

August 26, 2003

Components in Java

I guess I need to clarify my previous post.  I am not complaining about the impossibility to access COM from Java or from Python, but about the fact that Java doesn't enable component-based computing like COM does.  More precisely, Java needs the following:

  • A tool to look up all the Java components available on my desktop, or on a distant server.
  • A central place (either physical, file system, or virtual, such as a registry) where all the Java components can be found and browsed.
  • An API to declare what classes I make available and where the metadata can be found (contracts, documentation, etc...).
  • An API to look up the said components (QueryInterface()).

And if someone wants to point me to JavaBeans, I kindly invite them to reread the points above and realize that JavaBeans doesn't even get close to enabling component-based computing.

It's actually interesting to see that in some way, Web Services offer these services on a WAN.  But Java has no mechanism built in the JDK to enable true component reuse.  And it's very sad and causes endless rounds of wheel reinvention in the Java community.

Posted by cedric at 01:16 PM | Comments (4)

Python replacing Visual Basic? Nah.

In a recent entry, Nelson argues that "Python is good enough on Windows to replace Visual Basic":

Between wxPython, pyGame, and the win32all extensions you have all the doodads you need to build Windows apps.

Then he illustrates his point by explaining a little application he wrote in Python for his own use.

Unfortunately, he's missing the point.  The strength of Visual Basic has not much to do with the language itself, nor with the tools.  The reason why Visual Basic is so popular is COM.

Visual Basic is rarely used to write standalone applications, but the fact that you can reuse all the Office application (and a lot of others) from "outside" is a Windows feature that Linux and all other Unixes have never been able to achieve, despite commendable efforts. You have to hand it off to Microsoft, who truly enabled "component-based computing" as soon as 1995, when the first usable versions of COM appeared.

Here we are in 2003, with some stellar Java applications that we use every day and love.  How many of them can be scripted from the outside?  Where do I turn to if I want to reuse, say, IDEA's code editor?  How about Rational Rose's UML designer?  Or even a simple HTML renderer?

Component reusability cannot be retrofitted.  It has to be built in, from the ground up.  When I am writing a Java application and I have identified a component that might be of interest to other users/developers, there should be a simple way for me to expose a simple API that can be looked up and invoked from any language.

Before everyone points me to Jython and similar initiatives, yes, they are a good start, but nowhere near a true reusable component model similar to COM.  Until we see such a model in the JDK, we will never have true component reuse in Java.

Posted by cedric at 12:05 PM | Comments (6)

August 21, 2003

Stallman, Gosling and a bit of emacs history

I have recently read some incorrect interpretations of the story between James Gosling and Richard Stallman.  It's easy to get lost in the various ego-centered wars flying around so I thought I would take this opportunity to set the record straight by narrating the events as I remember them.  Of course, my recollection might be a bit biased but the links to the various posts should allow you to make your own decision.

Now, let's take a little trip back with father Tiresias...

Chronologically, the first "real" emacs was written by Stallman and then branched and improved by Gosling under the name "gosmacs".  There was obviously a little bit of friction involved with this first fork but it was nothing in comparison of what was going to follow.  But first of all, some background on Stallman.

We all know the individual and his extreme views on open source, but what most people probably don't know is that back then, Stallman was extremely hostile to graphics, bitmap screens and all this fancy new technology that was going to bring the computers to the masses.  He was even a very vocal enemy of...  the mouse.  Yup.

Anyway.

These beliefs made him very hostile to the simple idea of making gnuemacs usable in a graphic environment, which back then was X Window.  Tired with his position and also upset by the constant delay that emacs 19 was incurring, a group of people decided to fork off gnuemacs and start a new project intended to gather all the latest technologies that were picking up steam fast.

Most of these people were working for a company called Lucid, and therefore, they named their emacs "Lucid emacs" (which became XEmacs in 1994 after Lucid went out of business).

Implemented by a talented group of developers, one of them being an individual called Jamie Zawinski (and I'll get back to him shortly), Lucid Emacs soon reached a very decent shape within just a few months while gnuemacs 19 had been stagnating on the FSF hard drives for several years.  It was getting increasingly clear that Stallman was more than upset at the fork and at the very fast progress of Lucid Emacs, and he manifested his anger many times throughout this period, like for example in this exchange:

From: Richard Stallman <rms@traveller.cz>
To: jwz
Subject: lemacs 19.10

We decided not to post your announcements because they seem to say
unfair negative things about Emacs 19 and because they advertise
non-free Lucid products.
But it was too late.  Lucid Emacs was a high-quality implementation of emacs and its very innovative support for the mouse and other graphic features made it an instant hit in the emacs community.  Soon, gnuemacs users started asking for the same features in Emacs 19 and Stallman reluctantly conceded to at least look on the other side of the fence.

For someone who has made his goal in life to promote free software and code sharing, Stallman is showing a very puzzling tendency to practice the mantra "do what I say, not what I do".

First of all, he has been repeatedly nailed in public for reinventing things from scratch instead of using existing libraries, but once again, he showed an extreme reluctance to merging the code lines, even refusing to reuse the pieces that Lucid Emacs had already implemented.  This decision was partly due to Stallman's resolute belief to not trust anyone but himself but also from personal problems he had with some of the Lucid developers.

Jamie Zawinski tried several times to correct several misconceptions that Stallman had about the technical aspect of the work involved, but his advice fell on deaf ears.  The height of the debate was reached when it was pointed out that despite all his critiques of Lucid Emacs, Stallman had apparently not even try to run it.  I will let you read the rest of this fascinating thread, which sheds a lot of light about what it's like to work with Stallman (notice also the post from an individual named Marc Andreessen...  that was in 1991).

The merge between Lucid Emacs and emacs 19 was attempted but failed.  We will never know the exact technical reasons but Stallman's track record in this area doesn't leave much doubt in my mind.  However, we can see a general pattern in Stallman's ways:  he has a hard time dealing with success coming from others.  He showed this clearly with his catastrophic handling of the Lucid Emacs situation, and more recently with the "Gnu/Linux" fiasco, where he tried once again to receive credit for something he had nothing to do with.

But before we conclude this little retrospective, I'd like to say a few words about Jamie Zawinski, for whom I have a particular fondness.

Aside from being a very talented developer who supplied a great deal of high quality tools for developers during these troubled times (old-timers will remember Gnus, BBDB, etc...), Jamie is a hilarious person whose postings and constant pranks have brought more than a daily chuckle on developers faces back then.  His Web site leaves little doubt about his extreme devotion to hackdom, but he also regularly regaled many people with his constant stream of whacky ideas, the best of which is probably the Tent of Doom.  Be sure to read some of his random rants, they are worth it.  After Lucid, Jamie became employee #20 at Netscape and the rest is history...

Posted by cedric at 05:23 PM | Comments (8)

August 19, 2003

Popularity by virus

So far, I have received over a hundred emails from people infected by the new virus, and receiving more every hour.  I took some time to see if I knew any of the originators of these emails, so that I could notify them they have been infected in case I happen to know them.  Nope.  All perfect strangers so far.

In some perverted way, you can measure how popular you are by how many infected emails you receive.

Corollary:  if you haven't received any infected email yet, nobody loves you.

Posted by cedric at 01:52 PM | Comments (8)

August 17, 2003

Google's puzzling algorithms

I was investigating what looked like an Apache bug (I'll have to blog separately about that one, quite an interesting story) when I noticed in my logs a hit on my private Web site.  As the "private" part indicates, I don't expect to get many hits on that site except from friends, but this IP was definitely not familiar to me.  A quick "dig -x" revealed that the IP was a googlebot.  Somehow, google must have found a link to my private Web site and I am now being harvested.

Okay, fine, it's not like there is anything secret there anyway.

My curiosity was piqued, though:  how did Google find out about the site and this particular URL, which is a report on my heliskiing trip in Whistler.  Out of curiosity, I ran a google search on "Whistler heliski".

Yup, that's me in first place.  Before even the main heliskiing web site.  How is that possible?

So I went to my own heliski page and looked for backward links.  As you can see, it's pretty bare there.  Just my private web site, again, and...  my weblog.  I had totally forgotten about that.  So mystery #1 is explained:  Google harvested my weblog and got the URL from there.

Something is still mysterious, though:  how can these simple four links explain that I am mentioned even before the main heliskiing site?

Could it have something to do with the fact that...  I own my first name in Google?  (roar)

Yup, even the Entertainer is behind me.

This amusing incident makes me really wonder about Google's algorithms.  Not mentioning the fact that I thought Google did not show weblogs in its results. 

Another interesting thing is that Google has already registered my new weblog location (which is barely two weeks old), and that they are now respectively #1 and #2.

Fascinating.

Posted by cedric at 12:05 PM | Comments (4)

August 13, 2003

My first virus

Like probably thousands of other people, I have been hit by the MSBlaster virus.  I hadn't really noticed anything until an advisory suggested that I took a closer look.  And lo and behold, I had an msblast.exe process running and I also had that executable in \WINNT\SYSTEM32.

This is my first virus ever.  I am so excited.

Cleaning it was relatively easy.  For future references, you want to

Although I recognize viruses as a very real threat, I have never really been proactive at stopping them.  My work machine has an antivirus because it came with one, but none of my other machines do.  I use Outlook (well, used to) and other reputedly dangerous software, but I have always relied on my common sense to keep me out of trouble.

I am not saying this is a good idea.

One day, I expect to click on an unsafe attachment and infect myself.  We all have lapses in our attention and relying on our human senses to keep us safe from viruses is not just stupid, it's suicidal.  But well, habits die hard.

One word about Outlook:  there is this myth that it is the main enabler for virus propagation out there and that if you are using another client, such as Eudora or Mozilla Mail, you are safe.  This is incorrect.  Viruses typically travel through email attachments.  You can launch an attachment with any mail client and you will get infected just the same, so just be vigilant regardless of your mail client.  It is true that Outlook used to have unreasonable security defaults, but this is no longer the case.  Even Word and Excel now come with a high security default, not allowing you to run macros and other mechanisms that viruses use to propagate.

What's interesting is that I have always thought that I would be infected one day through email, but I ended up receiving a virus through another means (tftp and RPC).  Fortunately for me, this virus is relatively harmless for the user:  its main purpose seems to trigger a SYN attack on a Microsoft site on August 16th.  I am curious to see how this is going to unfold.  I am confident Microsoft has taken all the necessary precautions to foil the upcoming onslaught, but we will see.

I remember when I saw my first virus.  It was circa 1988 on the Amiga.  Viruses were totally unheard of back then.  This virus, called SCA, was probably not the first but definitely a very early one.  It propagated by copying itself on the boot sector of floppies and all it did is wait for the third invocation and then display a message saying "Something wonderful has happened, your Amiga is alive, etc...".  I remember finding this cool the very first time I saw it, probably because I had no idea it was based on a concept that would cause billions of dollars in losses in the coming years.

I disassembled the SCA virus back then and published an article about it in the French Amiga magazine I was working for.  As the assembly code was unfolding in front of my eyes, I remember feeling much more fascination than anger at the author.  It was such a neat idea (and also a pretty cool Copper list).

These days are gone. Protect yourself and if you don't like to use anti-viruses because they slow down your I/O operations, at least make sure your machine is reasonably up-to-date with security patches.

Posted by cedric at 08:06 AM | Comments (12)

August 12, 2003

Announcing EJBGen 2.15

I just released EJBGen 2.15.  The main addition is the use of templates (which will be improved further in the next version).  Here is the change log since 2.14:

2.15
- Fixed: pk and value classes were always regenerated
- Fixed: disable-warning moved to @ejbgen:jar-settings
- Added: documentation for templates
- Added: templates, and -templateDir option
- Fixed: variables not defined in ejbgen.properties are preserved
- Added: package specification in @ejbgen:file-generation
- Fixed: generate-on was not working properly
- Fixed: value objects now test against null in hashCode() and equals()
- Fixed: -localHome.baseClass was not working correctly
- Fixed: -jndiPrefix/-jndiSuffix was not honored in resource-description
- Added: -xmlEncoding
- Added: -noImports
- Added: {create|remove|passivate}-as-principal-name attributes
- Fixed: documentation for options
- Fixed: StackOverFlowError in debugLog()

Posted by cedric at 03:19 PM | Comments (1)

August 11, 2003

Poll about my configuration

I recently changed the description configuration of my RSS feed.  It used to be a forty-word description of the whole entry, but it now displays the entire entry in rich content (HTML + CSS).  The problem is I am receiving complaints with either of these configurations:

  • Some people don't like the abbreviated description because it forces them to click to have access to the whole entry.
     
  • Some people tell me the new configuration doesn't work because not all RSS readers seem to automatically dereference the <link> tag of my RSS feed.

What should I do?  Revert to the forty-word description or keep it as it is?

Posted by cedric at 11:48 AM | Comments (7)

Fighting spam with spam

Paul Graham posted further thoughts about spam.  One of his recommendations is to have client filters basically launching a distributed denial of service attack on the spammer's Web site:

So I'd like to suggest an additional feature to those working on spam filters: a "punish" mode which, if turned on, would retrieve whatever's at the end of every url in a suspected spam n times, where n could be set by the user.

While attractive, this idea comes with several problems of its own, the main one being abuse.  Using this technique, it becomes relatively easy to invoke a DDOS on certain sites.

Or does it?

Let's suppose we live in a world where a reasonable number of email clients have an "angry filter" built-in:  whenever this filter detects a spam that has a URL in it, it will retrieve the said page a certain number of times (say, ten).  By "reasonable number", I mean that there are enough of these filters to trigger a massive denial of service attack if a spam is sent out.  Considering the number of emails a typical spam involves (say, ten million) even if only a small percentage of the receiving clients (say 1% = 100,000 machines) has the software installed, this will result in about one million hits on the spammer's page.

If I wanted to abuse this system, I would basically have to turn myself into a spammer.  The cost of the infrastructure is not too high (spamming software, a CD with millions of email addresses, a lazy ISP ).  I also need to compose an email message that will be flagged as spam by mail filters (I can simply copy/paste an existing spam) and then include the URL of the victim's Web site in the message.

With that in mind, abuse does indeed seem easily to achieve.  Now, could we work around this problem?

We can imagine making the "angry filter" smarter:  it would try to relate the URL contained in the email with its content.  One way to do that would be to pick the ten words in the email message that were computed as being the highest probability by the Bayesian filter (things like "mortgage", "debt", etc...) and then see if the URL has any connection to these words.  Either by

  • Running some heuristic rules on the name of the URL itself.
     
  • Consulting a central database and see if the said URL has been flagged as a spam originator (the said database is of course going to receive a lot of hits, and the question of "who is this greater authority?" remains).
     
  • Connecting to the URL and parse its content before deciding further action (which probably defeats the purpose, although in this case, the angry filter might decide to limit its connection to the Web site to one instead of ten if it decides the spam is probably an abuse).

None of these options seem very effective to me and I have to say that overall, the idea of fighting unwanted traffic with even more traffic doesn't strike me as the right thing to do, even if giving the spammers a taste of their own medicine offers some strange sadistic appeal.

Maybe we could consider something more clever:  crawling the spammer's Web site in search of an order form and fill this form with bogus information that the spammer will have to process and validate.  Once this information is found, it could be uploaded to a central database so that other angry filters can skip this step and directly proceed to the form.

If nine out of ten orders turn out to be bogus, the spammers' operative costs will make the act of spamming less interesting to them.

Any other thoughts?

Posted by cedric at 09:15 AM | Comments (4)

August 08, 2003

Staying sharp

I recently received an email from a reader asking me for advice about improving his developer skills.

I am not sure there is any straight answer to this question but let me throw a few random ideas.

  • I have certainly found that reading (a lot of reading) is certainly a great way to accelerate your skills and to force your brain to expand its horizons.

    Reading Java books is one way of doing it but the most important thing to me is to try and vary the books I read as much as possible.
     
  • Studying other languages is also a fantastic and fascinating way of learning new concepts that change the way you think.  Somebody once said "learn a new language every year" and I can't agree more.  And just like natural languages, the more languages you know, the easier it becomes to learn new ones.

    Things can get really interesting when you start "cross-pollinating":  mixing concepts read in several books.  For example, trying to apply concepts that are particular to a language (e.g. closures) to a language that doesn't support them (e.g. Java).

    Also, don't be afraid of marginal or complex languages.  I see this practice similar to mathematics:  it's something that you study not because it will be directly applicable to you, but for all the wonderful benefits it will indirectly bring you.  For example, the complexity of C++ is daunting and you might not want to be involved in such a language as long as you don't really need it, but that would be a mistake.  Pick a book by one of the gurus and look at the amazing things they can do just by tweaking the concept of templates (partial specialization, traits, etc...).

    Similarly, more abstract languages such as Dylan or CAML are quite confusing at first but they will reshape your way of thinking in interesting ways.

  • It is very rare to find "curious" colleagues.  What I mean by curious is simply what I just described:  people who are not only good at their job but who also like to explore other areas and discuss them.  If you happen to have somebody like that in your work environment, take every opportunity you can to have lunch or coffee with them.  There is nothing more exciting than two curious spirits bouncing off ideas.  Separately, you and him will spark interesting ideas, but if you put both together, the total knowledge will greatly exceed the sum of its individual parts.
     
  • Of course, all this would be useless if all you did is read and never practice.  At all my jobs, I have always saved some time every day to do something "on the side", something that doesn't pertain to my work directly.   Working on side projects is definitely a way to sharpen your skills, especially if you can be involved in a project for a long time, which will allow you to go through all the cycles involved in the development process.  If you can do this, it's worth it.  Don't forget to have a life and enjoy your hobbies, though, this is a lot more important than it may seem.  If you are not balanced, you will quickly plateau at work, no matter how dedicated you are.

  • Finally, there are a lot of books I could recommend.  I try to maintain a list of books I read, feel free to take a look.

How do you guys "stay sharp"?

Posted by cedric at 07:03 AM | Comments (16)

August 07, 2003

EJB 3.0

There is an interesting thread going on at TheServerSide about EJB 3.0.  Some people have offered suggestions for the upcoming new specification, and one of them was interceptors:

You need to be able to restrict access based on what values instance properties hold. That is how real world systems work. Interceptors - or advices - would be a good way of implementing this.

While I initially responded than it would be interesting to have the specification define interceptors for EJB's formally, I am slowly changing my mind.  The more I think about it, the more I believe it would be a bad idea to add this to the EJB specification.  There are several reasons to this, which I'll examine in turn:

  1. Lack of use cases
  2. AOP

Lack of use cases

Interceptors have an undeniable "cool factor".  They make for great demos and allow you to take peek inside what used to be seen as a magic black box.  Over the years, I have talked to several customers using EJB intensively and only a few of them asked for interceptors.  What was interesting is that when I drilled down and asked them why exactly they wanted interceptors, it always turned out that their needs could be met through different means than EJB interceptors (some of them being flags that we already provided and they didn't know about, others being telling them about some EJB lifecycle events they didn't know about).

However, there were a few cases not covered by any of the above, which brings me to my second point.

AOP

Since we released our AOP Framework for WebLogic, it's become pretty clear to us that AOP can be easily applied to J2EE in general and EJB in particular.  If you haven't looked at it yet, it is basically a set of pointcuts defined in AspectJ that transparently ties into a server.  The interesting thing is that there is no AOP support in any of the WebLogic Server versions we support.  The package is totally external, and yet it is able to weave itself into existing code.

And therein lies its power:  suddenly, all your J2EE applications become not only potential targets for interception, but these interceptions can be extremely sophisticated.  As sophisticated as the AOP implementation you are using, actually (right now, we are using AspectJ).

With that in mind, the importance of specifying interception in the EJB specification becomes very questionable.  Not only would the specification be very narrow (it only applies to EJB's), it would also be very restricted since dealing only with the container part of the application.  Maybe I should clarify that.

EJB code is typically made of three parts:

  • User code
  • Generated code
  • Container code

A specification of interceptors would only cover container code.  Generated and container code will obviously vary depending on the server you are using.

With that in mind, it's pretty clear that an official interceptor specification would fall very short of addressing a broad category of problems.

AOP allows you to address these three aspects, assuming that the framework you are using supports both callee and caller pointcuts.

Conclusion

I think it would be a mistake to make interceptors part of the EJB specification.  Something more general is needed, both in scope (it should apply to J2EE and not just EJB) and conceptually (use aspect-oriented programming on top of the existing technologies).

Posted by cedric at 11:45 AM | Comments (3)

JSR 175 available

JSR 175 is now available for community review.

We (well, mostly Josh) have been working hard these past weeks to iron out the last details, and there are still a few unresolved issues, but this draft is very comprehensive and covers pretty much all the aspects involved in this JSR.

There is still time to provide feedback, so don't hesitate.

Posted by cedric at 10:46 AM | Comments (0)

August 04, 2003

"Inheritance considered evil" considered evil

You should always be very leery of any article that contains the words "considered evil" in its title.  This article, where Allen Holub is unconvincingly trying to make a case that inheritance is evil, is no exception.

There is a well-known phenomenon in the publishing industry pertaining to sensationalism. Whenever you want to write something that will get the attention of your readers, a simple technique is to pick a popular opinion and explain why everyone is wrong about it.

Holub has been using this trick in the past to get readership (like when he said that getters and setters were... uh, evil) but while it would be easy to attack this article on the form, let's take a look at what he is actually saying.

Allen mentions the fragile base class problem as one of the reasons why you should avoid inheritance altogether. This is a very well-known problem in the OO circles and C++ programmers have been aware of it for more than ten years. The problem is... Allen doesn't understand it.

To illustrate, Allen uses the example of a Stack class that uses a List as its underlying implementation and he notices that calling clear() on this object will clear only the base class and not the derived one. Fair enough, that's a good point. But this is *not* the fragile base class problem. What's interesting is that Allen does give the right description of this problem a few lines above:

Base classes are considered fragile because you can modify a base class in a seemingly safe way, but this new behavior, when inherited by the derived classes, might cause the derived classes to malfunction.

and then he proceeds to demonstrate something totally different.  Ahem.

What's overall disturbing with this article, and with the way Holub approaches problems in general, is the tendency to make sweeping statements.  Allen demolishes inheritance by showing a very poor example of inheritance.  Someone who would implement a Stack by

  • Using a List as the underlying implementation
  • Inheriting List instead of delegating to it

is clearly a beginner in object-oriented programming.  Nothing that a good book won't fix.  But no, instead of just observing this fact, Allen is trying to make a case that inheritance should be banned altogether.  As I read through his articles, it's pretty clear to me that whatever he is trying to do, he is using the wrong language.  How about Lisp instead?

Some of his points are worth making, though, like the emphasis on interfaces.  Now, there is a valid fight and it is important to raise every programmer's awareness to the importance of interface programming.

We all know that inheritance can be abused, but isn't it so for every feature of every language on the planet?  Instead of giving in to simple sensationalism, how about studying the pros and cons of inheritance and trying to educate your readers objectively?

This kind of article would be understandable in a personal weblog, but I expect better from Javaworld.

Posted by cedric at 12:02 PM | Comments (10)

First post

Welcome to my new weblog

Posted by cedric at 12:00 PM | Comments (1)