if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); } Otaku, Cedric's weblog: September 2003 Archives

September 30, 2003

The problem with "Half Beans"

In an entry titled "Functional style in Java", Brian Slesinsky presents a technique he calls "half beans".

Never mind the fact that I can't really see the connection between this and functional programming, I find his recommendations quite questionable.

Brian is trying to solve the problems of objects that are not quite completely initialized.  In order to make sure an object is fully initialized, he forces his class (Album) to have just one constructor that accepts only one parameter, of type AlbumBuilder.  AlbumBuilder is the helper class in charge of making sure the object will be fully initialized.

There are a number of issues with his points.

  • First of all, the name.  I initially assumed that "half bean" meant that the pattern is supposed to avoid those "half beans", those Java Beans that are not completely initialized.  Well, no.  Half in this case means that the getters are in the main class and the setters and no-op argument are in the Builder class.  And to validate his point, Brian makes the following parallel:
    The getters go into one class and the no-arg constructor and setters go in the other. If it looks familar, you're right - it's basically the same pattern as String/StringBuffer.
    String and StringBuffer certainly have no such connection.  They are two very separate classes and one just happens to use the other for its internal implementation, but that's just a detail.  They could happily live separately and would serve a purpose on their own, as opposed to the Album/AlbumBuilder duo.
     
  • Second, all the AlbumBuilder class does is postpone the problem.  Instead of making sure that Album is fully initialized, you must now put this logic in AlbumBuilder.
     
  • And finally, this technique introduces a lethal weakness in the application by coupling the classes Album and AlbumBuilder in ways that are not only invisible to the programmer but invisible to the compiler as well.  For example, if one day, the design mandates the addition of a field "Producer" to the Album class, I have to remember to update AlbumBuilder or everything will break.

Overall, this technique doesn't help you address the main problem which, in my opinion, is not where you should signal the error but rather how you should handle it.  It is much more important to decide whether such an error should be an AssertionError, or an IllegalArgumentException or something else, and more importantly, whether it should be recoverable or not.

On a related note, the initial motivation that made Brian come up with this design was the absence of named parameters in Java:

Some languages solve this problem with keyword arguments, but Java doesn't have them, so we need another solution.

It is actually quite easy to emulate named parameters in Java.  For example:

new Album().title("The Wall").band("Pink Floyd");

This is very much un-Java, but it's there if you decide you need it one day.

Posted by cedric at 08:29 AM | Comments (17)

September 24, 2003

Forthcoming talk on Aspects

I will be making a presentation about "Aspect-Oriented Programming in Java and J2EE" at the next SF Java Users Group on Wednesday, October 15th.

Also, my friend and colleague Chris Fry will regale you about StAX, the streaming XML parser, which has recently been reviewed at xml.com.

Stop by and say hi!

Posted by cedric at 11:14 AM | Comments (4)

September 23, 2003

Schwartz and numbers

From Jonathan Schwartz' interview:

Schwartz: I think it can definitely change our dialogue with our customers. If you look at our top 65 accounts, there's 10 million people there. At $100 each that's a billion dollars. So I think it certainly gives us a broader market opportunity

Mmmh... That's about 150,000 employees per company in average.

Someone needs to brush up on their arithmetic.

but I'm not a good prognosticator about our revenue streams.

You don't say.

Posted by cedric at 10:35 AM | Comments (3)

September 22, 2003

Checked exceptions and virtuality

I was reading the ongoing interviews of Anders Hejlsberg and James Gosling over the weekend and I had several thoughts.

If you are a regular reader of this weblog, you know that I have a high respect for Anders Hejlsberg and his work (current and past).  Overall, his stance on various issues is very pragmatic and fairly well articulated.  However, I find myself disagreeing on two of the issues discussed in the latest parts of these interviews, namely:

  • Why C# methods are not virtual by default.
     
  • Why C# doesn't support checked exceptions.

What strikes me in Anders' interviews is that while he gives numerous technical reasons for these choices, he omits to mention what I think is the principal motivation:  the necessity for cross-language compatibility.

C# runs on the CLR and has therefore to obey constraints that are sometimes non negotiable.  The CLR was built to be cross-language, and as such, it also has to support C++ and Visual Basic, none of which support checked exceptions and I don't really understand why Anders never even mentioned that these requirements weighed heavily on the design decisions involving these two choices.

As for the age-old debate "checked exceptions versus runtime exceptions", I refer you to the current TheServerSide thread which contains a lot of interesting articles (especially Mike Spille's).

As for the question "should methods be virtual by default?", it's close to a religious issue but I'll share a few thoughts.

The way I see it, code can be extensible in two ways:  "by design" and "technically".

Being "technically extensible" means that the language and tools you are using give you the power to extend the code without any workaround.  Languages that are on the "virtual by default" side tend to produce code that is more technically extensible than others.

If the code is extensible "by design", the extension points and their contracts have been thoroughly tested and documented.

It's very rare to find code that is extensible both technically and by design, but in my experience, at least if it is technically extensible, I can find ways to work around the absence of design.  If you mark your method private or final, I am left with no options at all.

As for James Gosling's interview, I highly recommend it, it's filled with very sensible advice about why checked exceptions are a good thing.  My favorite part is:

It really is a culture thing. When you go through college and you're doing assignments, they just ask you to code up the one true path. I certainly never experienced a college course where error handling was at all discussed.

James talks about the "creepy feeling" you should have when you write code that downplays the importance of errors and argues that with checked exceptions, you can't escape that feeling:  you have to face it.

And I think it's a good thing.

Posted by cedric at 02:53 PM | Comments (5)

September 17, 2003

The ultimate View technology

As you probably know, there are a lot of "template view" technologies out there.  The most popular, and also a J2EE standard, is JSP.  Another one is Velocity.  One could also name XML+XSL.

All these technologies have pros and cons, but they all have in common that they are mixing two different languages in a single source.  The amount of interlace varies, based on the technology you use and it's sometimes hard to draw the line.  Is a JSP page an HTML or a Java source?

These technologies also have different ways to solve the "escape in / escape out" problem.  In JSP, you use special markers to insulate the Java code from HTML.  In Velocity, you use # and $ to refer to Velocity code.  None of these methods are real showstoppers but they can end up producing some fairly ugly templates.

Is there no really clean way to solve this problem?

Well, there is.

Ruby.

In a previous entry, I used a mix of two features in Ruby (closures and dynamicity, and more particularly, method_missing) to solve the following three problems in one fell swoop:

  • Make sure each open tag gets closed automatically.
  • Automatically indent.
  • Integrate cleanly with the Ruby syntax.

Let me give a quick example from the LogAnalyzer utility I have been discussing these past days.  This piece of code generates the HTML to display a list of referrers:

def createMiddleReport
  xml = XML.new
  @referrers.referrers.each { |date, refs|
    xml.p {
      xml.b("Referrers for #{date}")
    }
    xml.table {
      refs.each { |ref, ref2|
        xml.tr {
          xml.td {
            xml.a({ "href" => ref }) {
              xml.append(ref)
            }
          }
        }
      }
    }
  }

  xml.to_xml
end
Notice how this code happily mixes templating and logic.  I iterate whenever I have to and I insert the content into the HTML string when the time is right.  Other technologies will subtly impose their programming model on your code, for example by making you compute the generated code and store it in a HashMap, or assign it to a variable which you can then use.

With this method, there is no need to escape in and out of HTML:  it is automatically covered by the method invocations on the XML instance.

Here is another example:

xml = XML.new
xml.html {
  xml.head {
    xml.title("Statistics for http://beust.com")
  }
  xml.body {
    xml.table{
      xml.tr {
        xml.td({ "valign" => "top" }) {
          xml.append(v.createLeftSideReport)
        }
        xml.td({ "valign" => "top" }) {
          xml.append(v.createMiddleReport)
        }
      }
    }
  }
}

File.open(OUTPUT_MAIN, "w") { |f|
  f << xml.to_xml
}
If you are curious, you can take a look at the final result, and also download xml.rb.

Alright, let's be a little bit more serious now.

Of course, saying that Ruby is the ultimate template view technology should be taken with a grain of salt.  Obviously, this is not an option if you are a Java programmer.  You should also note that Ruby happens to have two very handy functionalities that make this trick possible, the method_missing hack is actually more due to Ruby being dynamically typed than anything else.  Also, if your language of choice doesn't support closures, you will be reduced to something like XMLStringBuffer, which I described in a previous entry.  It is not as pretty as what you just saw, but it fits the bill pretty well.

 

Posted by cedric at 04:27 PM | Comments (13)

September 16, 2003

Log analyzer in Ruby

Here is the problem I am trying to solve:  all the statistics for my Web site are stored by my ISP in a directory, one per day.  Each file is compressed and called, for example, www.20030915.gz.

I want to write a Log analyzer that will make it easy for me to collect various statistics and still be extensible so that I can add more monitoring objects as time goes by.  Right now, here are some examples of the numbers I'd like to see:

  • Number of hits on my site.
  • For my weblog, number of HTML and RSS hits.
  • The list of referrers for, say, the past three days.
  • The number of EJBGen downloads each day.
  • The keywords typically used on search engines to reach my site.

Of course, it should be as easy to obtain totals per month or even per year if needed.

The idea is the following:  when the script is run, it should run through all the compressed files and build an object representation of each file and line.  Then it will invoke each listener with two pieces of information, Date and LogLine.  Each listener is then free to compute its statistics and store them for the next phase.

Once the data gathering is complete (back-end), it's time to present the information.  There are several possibilities to achieve that goal but for now, I'll just make sure that back-end and front-end are decoupled.  I envision one class, View, to be passed all the gathered information and generate the appropriate HTML.

So first of all, we have the class LogDir, which encapsulates the directory where my log files are stored.  Using the convenient "backtick" operator, it is fairly easy to invoke gzip on each file and store each file in a LogFile object, which in turn contains a list of LogLines.

When it's done, LogDir then calls all the listeners with the following method:

def processLogFiles
  @files.each { |fileName|
    sf = LogFile.new(fileName)
    sf.logLines.each { |l|
      @lineListeners.each { |listener|
        listener.processLine(fileNameToDate(fileName), l)
      }
    }
  }
end # processLogFiles
The main loop is fairly simple:
ld = LogDir.new(LOG_DIR)
ld.addLineListener(ejbgenListener = EJBGenListener.new)
ld.addLineListener(weblogListener = WeblogListener.new)
ld.addLineListener(referrerListener = ReferrerListener.new)
ld.addLineListener(searchEngineListener = SearchEngineListener.new)
ld.processLogFiles
The last line is what causes LogDir to start and invoke all the listeners.

For example, here is the EJBGenListener.  All it needs to do is see if the HTTP request includes "ejbgen-dist.zip" and increment a counter if it does.  The overall result is a Hashmap of counts indexed by a Date object:

class EJBGenListener
  def initialize
    @ejbgenCounts = Hash.new(0)
  end

  def processLine(date, line)
    if line.command =~ /ejbgen-dist.zip/
      key = date.to_s
      n = @ejbgenCounts[key]
      n = n + 1
      @ejbgenCounts[key] = n
    end
  end

  def stats
    @ejbgenCounts
  end
end # EJBGenListener
The only thing worth noticing is that the Hash constructor can take a parameter which represents the default value of each bucket (0 in this case).

Ruby's terseness is a real pleasure to work with.  For example, I need to run some listeners on the three most recent files of the directory (which obviously change every day).  Here is the relevant Ruby code:

Dir.new(dir).entries.sort.reverse.delete_if { |x| ! (x =~ /gz$/) }[0..2].each { |f|
  // do something with f
}
Compare this with the number of lines needed in Java...

So far, the code is mundane and very straightforward, not very different from how you would program it in Java.  In the next entry, I will tackle the front-end (HTML generation) because this is really the point I am trying to make with this series of articles.

Posted by cedric at 12:22 PM | Comments (10)

September 15, 2003

Open Source and Documentation

Ted and a few other people (see the comments) are complaining about the quality of Open Source documentation.  They are not alone.

Here is a typical example.

On a regular basis, I see an announcement for a new utility show up on JavaBlogs or some other news source.  I immediately click on it and very often, the link is merely taking me to the home page of the project on sourceforge.  That's already a bit frustrating, but okay, fine.  My reflex then is not to click on the "files" link, nor "lists", nor to check out the CVS repository.

I click on "Documentation".

And 99% of the time, that page is empty.

At this point, I just close the tab and move on, and you have just lost a potential user.

If you are going to post an announcement for your project, you need to take some time off coding and write up a document.  It doesn't need to be extensive, it doesn't need to be perfect, but just like Jason, Ted and others, I don't have the time to read your source code.  I will be very happy to have it handy if I need to debug something in your code one day, but until that day happens, your documentation is all I need.

Explain what problem you are trying to solve, how you solve it and how to use your software.

But there is more to writing documentation.  To me, a developer who spends some time trying to communicate her work other than through code shows that she has some perspective.  She is not just "all code".  She understands users are a different breed and that you need to interact with them if you are really trying to solve their problem, as opposed to just "scratching a technical itch" because it's fun and then pretending you have a product.

Admittedly, documentation written by developers is rarely good, and after a certain point, you do need technical writers.  But for the SourceForge kind of project, it's more than enough and it shows the world that you are not just a hacker:  you are a developer, and you remember who you are working for.

Posted by cedric at 08:47 AM | Comments (3)

September 12, 2003

Generating XML in Ruby

I have been running my weblog on Movable Type for about a month now and I have to say I am really impressed.  For a collection of scripts put together, Movable Type is an impressive piece of software, both powerful and intuitive.  I expected it to be a challenging installation, especially since I am not running my blog on my home machine but at an ISP, but it turned out to be remarkably painless.

Having said that, I have one big complaint:  no support for referrer logs.  I couldn't find any way to have quickly access to my referrer log anywhere in the Movable Type distribution.  A quick Google query turned up several packages implemented in various languages.  I tried a lot of them but I could never quite reach the result I was looking for, so I decided to write my own.

My ISP conveniently stores the logs for my Web site every night in a well-defined directory, following a standard naming notation for each day.  I decided it would be easier to calculate my log referrer from these logs instead of embedding scripting information in my main index file, since the updates don't really need to be more frequent than once a day.

Finally, I had to choose a language.  Since I opted for the static approach, I am not limited to the languages that my ISP supports for CGI programming (PHP and Perl).  The obvious choice was Ruby, which excels at this kind of treatment with its native support for regular expressions, invocation of external commands and offers an object-oriented language from the ground up giving me extreme flexibility in my attempt to write a utility that will be easy to extend for my future log parsing needs.

Since I was going to have to generate HTML, I thought I would port a small Java class that I have been using to generate XML in EJBGen called XMLStringBuffer.  The idea is simply to not have to worry about indentation and closing the tags.  With this class, generating XML is as simple as:

XMLStringBuffer xsb = new XMLStringBuffer();
xsb.push("person");
xsb.addRequired("last-name", m_lastName);
xsb.addOptional("first-name", m_firstName);
xsb.pop("person");

Note that I don't really need to specify the closing tag in the pop() call, but it makes debugging easier since the XMLStringBuffer maintains an internal stack of the tags and can therefore tell me right away if my push/pop get out of synch.

It quickly occurred to me that I could make this class even fancier in Ruby thanks to two features that are sadly absent from Java:  closures and method_missing (really dynamic typing).

The idea is to use closures to simulate indentation, and method_missing to make the XML class allow invocations on any method.  If the said method is unknown, it is simply turned into an XML tag.

Here is a piece of code that will make it all clearer:

xml = XML.new

xml.html {
  xml.head {
  }
  xml.body {
    xml.table {
      xml.tr {
        xml.td({ "valign" => "top"}, "Content1"){
        }
        xml.td {
          xml.append("Content2")
        }
      }
    }
  }
}

As you can see, each new closure (pairs of { }) starts a new tag and will cause an indentation and the proper tag to be closed when the block is exited.  Note also that every tag can be passed a Hash that will be turned into attributes if found.  You can also specify the content of the tag either inline or later in the closure with the append() method.  The generated XML is as follows:

<html>
  <head>
  </head>
  <body>
    <table>
      <tr>
        <td valign="top">Content1</td>
        <td>
          Content2
        </td>
      </tr>
    </table>
  </body>
</html>

The XML class is about forty lines, including comments.

In a next entry, I will give more details about the logging utility itself.

Posted by cedric at 04:19 PM | Comments (2)

September 11, 2003

More on components and classes

Anthony offered some interesting thoughts on classes and components, but I believe he is not pushing the idea far enough, which leads to some very unpractical considerations, such as:

Thus methods should not return values because it is the events which are used to determine the results

This is clearly at odds with the way we program today, and probably will be for many years to come.

The solution out of this dilemma is to think of classes and components as orthogonal features.  You don't need to compromise one to get the other.  They complement each other very nicely.  I believe Anthony's misdirected view comes from the fact that he makes a one-to-one relationship between a class and a component (and even a one-to-one relationship between a method and an event).  Things do not need to be coupled that finely, although it can occasionally happen.

Just like a component can span over several classes, an event doesn't necessarily map to one single method.

The way I see it, which is very reminiscent of the way COM developers have been programming these past years, you can start by writing your application the normal way, and then identify components and events as you go by.  Of course, you can also identify these components and events in the design phase, as long as you don't make the mistake of tying them tightly to the way you are going to design your classes.

Once you have your application, you can start sprinkling event firings wherever you see fit.  You can also notice that a set of classes taken together can form a component, hence mimicking the metaphor of the integrated circuit.  Then you can choose to formalize this component using your favorite component model (whether it already exists or will be invented one of these days).

When you look at your application from "above", it doesn't matter if a component uses one or several classes or if methods return values.  These are implementation details that are inside your integrated circuit.  You need to understand them if you want to modify the behavior of your component, but they are of no use to you if all you need is the component itself, with its attributes and its events.

This vision is relatively simple and it's too bad that the only component model that might come close to allowing us to achieve it is JavaBeans, because this specification is very primitive and encourages bad component programming practices, such as implying an isomorphism between classes and components.

I have very high hopes that a better component model leveraging JSR 175 will emerge and will allow us to achieve a similar degree of component reuse as the one we see in COM today.

Posted by cedric at 11:08 AM | Comments (1)

September 10, 2003

From classes to components

After Holub's nonsensical article about getters and setters, it's quite a relief to read Anders Hejlsberg interview, especially since the C# architect is basically saying the exact opposite of what Holub tried to say.

Anders emphasizes the fact that nowadays, developers need to think less in terms of classes and more in terms of components (another debate that has recently flared up in the Java community).

The way I see it, Components are a superset of Classes.  What exactly differentiates components from classes is open for interpretation, but I like Anders' simplification:  while classes are about properties and methods (PM), components are defined by PME:  properties, methods and events.

This observation makes both Properties and Events prime citizens of the Component programming model, which explains why C# supports both of them natively, while Java achieves this through interfaces (another language that has native support for accessors but not events is Ruby).

I, for one, really wish that Java had native support at least for accessors, so that we can finally drop the confusing "A read-write property is defined if the Java class has two methods, getFoo() and setFoo().  In this case, the name of the property is that of the method where you remove "get" and lowercase the first letter of the remaining name".  Yikes.

Another topic that Anders discusses in this interview is delegates.

Ever since Microsoft's initial attempts to add delegates to its own Java Virtual Machine, Java developers have had a very strong bias against this concept.  While creating an incompatible JVM is indeed something that should be fiercely condemned, it's a shame that the concept of delegates was thrown away with it, because it makes a lot of sense in a language such as Java.

Anders gives several reasons why delegates are a good idea, but to me, the one that's most important is that delegates allow you to keep the amount of classes and interfaces to a reasonable level.

Every Java developer who has written Swing applications (or any GUI, for that matter) knows how Action objects quickly proliferate, making the whole architecture hard to follow, not mentioning the number of objects that get created just so that a callback method can be invoked.

Delegates allow you to tie an Action to one single method.  No new interface or new class is needed.

Delegates go one step further:  they don't require type conformance but only signature  conformance.  This design choice reopens the age-old debate about static versus dynamic binding, and more particularly, begs the following question:  if two methods have the exact same signature but belong to two different classes, are they semantically equivalent?

My experience is that in practice, it's something that doesn't really cause problems (I usually make a similar observation about untyped Collections and the fact that in practice, the downcast is rarely a source of ClassCastExceptions).

Anders makes some other excellent points, such as the performance gain of a delegate versus a direct method invocation, or his interesting take on what he calls "simplexity".  Read the interview for more details.

Posted by cedric at 09:12 AM | Comments (6)

September 08, 2003

Holub is at it again

Allen Holub is at it again.  I already commented on his previous column about inheritance and I pointed out his usual habit of coming up with a provocative title to get readership and then try to turn his ideas into an article, usually very unconvincing.

This one is no exception.  Worse, it is actually a rehash of an article he wrote a few years ago. 

This (in)famous article is an unconvincing attempt at proving that getters and setters are evil.

Can you make massive changes to a class definition—even throw out the whole thing and replace it with a completely different implementation—without impacting any of the code that uses that class's objects?

And what does this have to do with accessors?  Besides, getters and setters are part of the contract of the class, they are no different than business methods.  If you modify anything that's part of the public interface of the class, you are going to break existing code.

Holub's thought is basically "A program is made of code only", which is a terrible simplification.  Programs are made of code and data and refusing to acknowledge this is the sure way to catastrophic designs.

Don't ask for the information you need to do the work; ask the object that has the information to do the work for you

What if this work absolutely doesn't belong in the object in the first place?  What if you need to gather data from different objects and then process it in a way that clearly doesn't belong anywhere else than in your object?  Of course, you can create an intermediary object to do that, but wait...  you already did this on your current object!  Following Holub's advice in this particular case basically means turning everything into an object, even when this is not really needed.  This screams "over design" to me.

Though getIdentity starts with "get," it's not an accessor because it doesn't just return a field. It returns a complex object that has reasonable behavior.

Oh but wait...  then it's okay to use accessors as long as you return objects instead of primitive types?  Now that's a different story, but it's just as dumb to me.  Sometimes you need an object, sometimes you need a primitive type.

Also, I notice that Allen has radically softened his position since his previous column on the same topic, where the mantra "Never use accessors" didn't suffer one single exception.  Maybe he realized after a few year that accessors do serve a purpose after all...

Bear in mind that I haven't actually put any UI code into the business logic. I've written the UI layer in terms of AWT (Abstract Window Toolkit) or Swing, which are both abstraction layers.

Good one.  What if you are writing your application on SWT?  How "abstract" is really AWT in that case?  Just face it:  this advice simply leads you to write UI code in your business logic.  What a great principle.  After all, it's only been like at least ten years since we've identified this practice as one of the worst design decisions you can make in a project.

In 1989, Kent Beck and Ward Cunningham taught classes on OO design, and they had problems getting people to abandon the get/set mentality. They characterized the problem as follows:

The most difficult problem in teaching object-oriented programming is getting the learner to give up the global knowledge of control that is possible with procedural programs, and rely on the local knowledge of objects to accomplish their tasks. Novice designs are littered with regressions to global thinking: gratuitous global variables, unnecessary pointers, and inappropriate reliance on the implementation of other objects.

Ah, what a subtle way to twist a message to make it match your own.  Back then, the concern was not about accessors.  OO advocates were simply trying to change the minds of procedural developers who were used to consider everything in terms of method calls.  The idea was to introduce the concept of an object, which encapsulates both behavior and data.

If there is one thing that survived the switch from procedural to OO techniques, it's the fact that data is central to programming.  The paradigm shift comes from the fact that we access this data differently, not that we stop accessing it altogether.

Allen keeps repeating that "Calling accessors to get your data makes your code less maintainable" but I don't see an ounce of proof in this article.  Tying maintainability to accessors is a very naive and simplifying assumption.  It's even misleading in the sense that developers might now be tempted to write code without accessors and assume that it will automatically be more maintainable.

Maintainability of code is connected to several factors, and one of them is coupling.  Coupling can happen in a variety of ways and accessors are just a very tiny fraction of them.  Ignore the bigger picture at your own risk.

Posted by cedric at 03:45 PM | Comments (24)

Thousands of bugs in the Character class

I received an interesting bug report on ejbc recently.  It's very simple:  one of our Japanese customers is using his native alphabet to name CMP fields but ejbc complains because the said CMP fields do not start with a lowercase letter, as mandated by the specification.

None of the three Japanese alphabets have the concept of uppercase/lowercase letters, so I immediately suspected a bug in the Unicode support of the JDK.  I wondered how the Character API implemented the toLowerCase() method for these alphabets that do not have lowercase letters, so I wrote the following test case:

public static void main(String[] argv) {
  int count = 0;
  for (char i = 0; i < 65535; i++) {
    if (! Character.isLowerCase(Character.toLowerCase(i)))
      count++;
  }
  System.out.println("# of incorrect values: " + count);
}

The idea is simple:  regardless of whether a certain alphabet has lowercase letters or not, the call isLowerCase(Character.toLowerCase(...)) should always return true.

Well, the result is interesting:

# of incorrect values: 64077
Ouch.

This made me wonder how Character.toLowerCase() is implemented...

public static boolean isLowerCase(char ch) {
  return (A[Y[((X[ch>>5]&0xFF)<<4)|((ch>>1)&0xF)]|(ch&0x1)]
          & 0x1F) == LOWERCASE_LETTER;
}

And people say that obfuscated Java is impossible...  (in case you wonder:  this is the real source, not the decompiled version).

Okay, having said that and after poking some harmless fun at the Sun developers, I have to say I actually understand why this method would be so obfuscated.  The call needs to be very fast and it's not like hundreds of developers are going to refer to this source for guidance.

Still, the lowercase handling of Unicode characters is severely broken in the JDK, so beware.

Posted by cedric at 08:32 AM | Comments (6)

September 03, 2003

A new kind of components

In "Where are the components", Joseph Ottinger makes some interesting points about the absence of reusable components in the Java land.  In many ways, I share his concerns and I said so in a previous entry.  However, there are a few inaccuracies in his editorial that I'd like to address.

Joseph says:

EJBs were the answer to distributed computing, with a promise of massive scalability.

That's not at all what EJB's were supposed to be, nor what they promised.  Initially, the EJB specification defined a component model, and nothing more.  Granted, it was ambitious, as it tried to address several "fringe" issues as well, such as security, remotability and persistence.  The specification made a few initial errors in its first revisions, which are now addressed for the most part.

I do agree with Joseph about Stateful Session beans, which qualify as a failed experiment in my book as they are bested by servlet sessions, but the following statement needs some precisions:

Sure, you can get some of their promise out of session beans, and message-driven beans are easily my favorite aspect of the EJB spec, but getting good performance out of stateful beans or entities can be a fine art in and of itself.

It's true that writing Entity beans that scale requires some expertise, but then again, scalability is a complex problem.  No technology will give you this feature for free.  Blaming the EJB specification for this is unfair.

Let me turn to a quick example to illustrate my point.

Those of you who are aware of FreeRoller already know about this.  FreeRoller still has massive scalability problems and the hardware and the bandwidth can no longer be the culprit.  This leaves two possibilities:

  • Hibernate and/or Roller need to be optimized.
  • Hibernate and/or Roller have a fundamental design flaw that prevents them from scaling.

A few very good developers (including Gavin) have been investigating this issue but so far, the results were inconclusive and FreeRoller is still showing dire scalability problems.

Now don't get me wrong:  I love Hibernate and I think it's a piece of software that will pave the way for future persistence frameworks, but right now, it is showing that scaling and relational persistence are two goals that are extremely hard to achieve together.

I contend that EJB's enable this goal, as many WebLogic customers can testify.

But don't let this distract you from Joseph's article, which makes good points about the absence of real Java components.  This problem is not new and I read about it for the first time in 1994 in Oliver Sims' seminal book "Business objects".  CORBA took a shot at it back then, but didn't go very far.

Now here is a thought:  what if the components were already here, but we're not seeing them because they adopted a shape we are not expecting?

Think about it.

When a corporation buys Microsoft Word, they do it for the end user application in the first place.  The fact that they suddenly have all these powerful COM components at their disposal is just a by-product of this choice.  It's not advertised, nor does it represent the main use of the software, but it gets used nevertheless.

Let's turn to J2EE application servers now.  Aren't they the epitome of component reuse?  When you write a J2EE application, you are reusing a staggering amount of components that have been written and packaged for you by experts:  web, database, remote access, transaction, security...

These are components right there.  Not the kind that Oliver Sims envisioned, but components nevertheless.

Components are happening.  Slowly, but surely and pervasively.  J2EE application servers provide us the equivalent of integrated circuits for the electronic world:  well-defined blocks of functionalities that developers can build upon to raise the level of abstraction that they provide.  We have the plumbing and we are now progressively building the house around it.

And J2EE is the main enabler for this amazing phenomenon.  I can't wait to see what the business world will look like in ten years from now...

Posted by cedric at 10:05 AM | Comments (16)