I received an interesting bug report on ejbc recently.  It’s very
simple:  one of our Japanese customers is using his native alphabet to name
CMP fields but ejbc complains because the said CMP fields do not start with a
lowercase letter, as mandated by the specification.

None of the three Japanese alphabets have the concept of uppercase/lowercase
letters, so I immediately suspected a bug in the Unicode support of the JDK. 
I wondered how the Character API implemented the toLowerCase() method for these
alphabets that do not have lowercase letters, so I wrote the following test
case:

public static void main(String[] argv)
{
  int count = 0;
  for (char i = 0; i < 65535; i++) {
    if (! Character.isLowerCase(Character.toLowerCase(i)))
      count++;
  }
  System.out.println("# of incorrect values: " + count);
}

The idea
is simple:  regardless of whether a certain alphabet has lowercase letters
or not, the call isLowerCase(Character.toLowerCase(…)) should always return
true.

Well, the result is interesting:

# of incorrect values: 64077

Ouch.

This made me wonder how Character.toLowerCase() is implemented…

public static boolean isLowerCase(char ch) {
  return (A[Y[((X[ch>>5]&0xFF)<<4)|((ch>>1)&0xF)]|(ch&0×1)]
          & 0x1F) == LOWERCASE_LETTER;
}

And people say that obfuscated Java is impossible…  (in case you
wonder:  this is the real source, not the decompiled version).

Okay, having said that and after poking some harmless fun at the Sun developers, I
have to say I actually understand why this method would be so obfuscated. 
The call needs to be very fast and it’s not like hundreds of developers are
going to refer to this source for guidance.

Still, the lowercase handling of Unicode characters is severely broken in the
JDK, so beware.