if ( document.comments_form.url ) { document.comments_form.url.value = getCookie("mtcmthome"); }
In a previous entry, I discussed an annotation design pattern called "Annotation Inheritance". Here is another annotation design pattern that I have found quite useful.
Class-Scoped Annotations
This design pattern is very interesting because it doesn't have any equivalent in the Java World.
Imagine that you are creating a class that contains a lot of methods with a similar annotation. It could be @Test with TestNG, @Remote if you are using some kind of RMI tool, etc...
Adding these annotations to all your methods is not only tedious, it decreases the readability of your code and it's also quite error prone (it's very easy to create a new method and forget to add the annotation).
The idea is therefore to declare this annotation at the class level:
@Test
public class DataBaseTest {
public void verifyConnection() { ... }
public void insertOneRecord() { ... }
}
In this example, the tool will first look on each individual method if they have an @Test annotation and if they don't, look up the same annotation on the declaring class. In the end, it will act as if @Test was found on both on verifyConnection() and insertOneRecord().
The question now is: how will the tool determine which methods the class annotation should apply to?
There are three strategies we can consider:
Of course, we should also add another dimension to this matrix: should the methods under consideration be only on the current class or also inherited from superclasses? To keep things simple, I'll assume the former for now, but the latter brings some interesting possibilities as well, at the price of complexity.
Using visibility as a means to select the methods might be seen as a hack, a way to hijack a Java feature for a purpose different than what it was designed for. Fair enough. Then how could we tell the tool which methods the class-level annotation should apply to?
An antiquated way of doing it is using syntactical means: a regular expression in the class-level annotation that identifies the names of the methods it should apply to:
@Test(appliesToRegExp = "test.*")
public class DataBaseTest {
public void testConnection() { ... } // will receive the @Test annotation
public void testInsert() { ... } // ditto
public void delete() { ... } // but not this one
}
The reason why I call this method "antiquated" is because that's how we used to do it in Java pre-JDK5. This approach has a few significant flaws:
A cleaner, more modern way to do this is to use annotations:
@Test(appliesToMethodsTaggedWith = Tagged.class)
public class DataBaseTest {
@Tagged
public void verifyConnection() { ... }
@Tagged
public void insertOneRecord() { ... }
}
Of course, this solution is precisely what we wanted to avoid in the first place: having to annotate each method separately, so it's not buying us much (it's actually more convoluted than the very first approach we started with).
So it looks like we're back to square one: class-level annotations applying to public methods seems to be the most useful and the most intuitive to apply this pattern, and as a matter of fact, TestNG users have taken quite a liking to it.
Can you think of a better way?
I can't pass this up... Jon was inspired by the import and macrodef features of ant and he wrote... this.
This is the funniest hack I have seen in a while.
Throughout my work with EJBGen, EJB3 and TestNG, I have identified a couple of annotation-related patterns that have proven to be quite powerful. They are called "Annotation inheritance" and "Class-scoped annotations".
Annotation Inheritance
The idea is simply to extend the familiar inheritance mechanisms to annotations. Consider the following annotation:
public @interface Author {
public String lastName();
public String date();
}
And an example use:
@Author(lastName = "Beust", date = "February 25th, 2005")
public class BaseTest {
// ...
}
public class Test extends BaseTest {
// ...
}
If you try to look up annotations on the Test class using the standard reflection API, you will find none, since inheritance is not supported by JSR-175 (I submitted the idea but it was decided to keep the specification simple and leave this kind of behavior to tools, which is exactly what we are doing right now).
A tool using this pattern would therefore see an @Author annotation on both BaseTest and Test.
Since we follow the same overriding rules as Java, inheritance would also work on methods that have identical names and signatures.
Where things get interesting is when you start considering "partial inheritance" (or partial overriding). Consider the following:
@Author(lastName = "Beust", date = "February 25th, 2005")
public class BaseTest {
// ...
}
@Author(date = "February 26th, 2005")
public class Test extends BaseTest {
// ...
}
This time, the class Test is overriding the @Author annotation but only partially. Obviously, the date attribute in the @Author annotation will return "February 26th, 2005", but what is the value of name? Should it be null or "Beust"?
My experience seems to indicate that while not necessarily the most intuitive, the latter form (partial overriding) is the one that is the most powerful. Partial overriding is a very effective way to implement "Programming by Defaults", which is a way of saying that you provide code with defaults that do the right thing for 80% of the cases.
Basically, all you need to do to provide these defaults is to store these annotations on a base class and require client code to extend these base classes. Clients are then free to either override already-defined attributes or add their own, and the tool will gather all the attributes by collecting them throughout the inheritance hierarchy, starting from the subclass and working its way up to the base classes.
In a next entry, I will describe the Class-Scoped Annotation pattern, and more importantly, how it can be combined with Annotation Inheritance to create some very elegant constructs.
When you are putting together a Web site, there are two things you need from a language:
As far as I can tell, PHP's support for the former is adequate but the Web is definitely its forte.
I can only talk about PHP's support for MySQL, but support for other databases is probably not very different. As a friend of mine told me not long ago, "there are not a hundred ways you can retrieve rows from a database".
The pair PHP-MySQL is actually so popular that it's very likely that if your ISP supports PHP, they probably installed the MySQL extensions with it, and a quick way of telling is by invoking phpconfig() and look for "MySQL" in the result page.
MySQL support is pretty much identical to JDBC: very low level, you name columns directly and you reference results by ordinal number. And just like JDBC, you need to remember to close the connection when you're done:
$resultRow = mysql_query($query);
$rowCount = mysql_numrows($resultRow);
for ($i = 0; $i < $rowCount; $i++) {
$name = mysql_result($result, $i, "name");
$date = mysql_result($result, $i, "date");
}
I am sure there are numerous packages built on top of this simple abstraction but I haven't done any research yet, and I am purposely trying to keep things very basic with my code (hence no class or other object-oriented features of PHP for now, although just using classes would already help separate neatly the various layers of my application).
The only principle I have found helpful so far is to centralize all the database-oriented code in one single file, and avoiding to use hardcoded strings to reference anything in my schemas. Having said that, I can already envision some future maintenance nightmare...
Let's turn to Web support now, which is where PHP really shines.
There are three areas of particular interest to Web developers:
And in the three areas, PHP is an example of simplicity.
Consider the following form:
<form action="post.php">
<input type="text" name="date" />
</form>
You collect the value entered in the text field in post.php like this:
$date = $_POST["date"];
Of course, you would use $_GET if that's the action you are using instead.
Cookies follow a similar pattern:
setcookie("user", "cedric");
// ...if (isset($_COOKIE["user"])) {
$user = $_COOKIE["user"];
}
Sessions are stored in an array called, unsurprisingly, $_SESSION. You can have one started automatically by PHP or do this explicitly with session_start(). Of course, the same warnings as in J2EE apply, such as making sure you keep the number of variables in your session to a minimum (you can unregister variables with session_unregister()).
If you can put aside the mildly annoying asymmetry in the API (sometimes you invoke a function, other times it's a global array), PHP puts a lot of power in your hands with these simple API's, and making changes involving an alteration of a schema and the accompanying change in the business logic and the HTML can often be made in less than ten minutes.
The next task I'd like to tackle is to research a higher level of abstraction than what I have been looking at so far, such as template frameworks and database abstractions.
See how many levels you can finish before your eyes blow up...
Frank Bolander posted a thoughtful comment on my previous PHP entry:
Its allure as an alternative/proxy to ASP/JSP makes everyone blinded IMO just because of GPL. It's pretty sad when a server side scripting engine will allow Perl statements to be injected in GET parameters and cause major damage after all the years of use and hype.
I am well aware of the scalability issues of a 1-tier solution and of PHP's security risks, which, as Frank points out, have made the news recently. I'm not particularly worried about the Web site I've been working on, which receives very little traffic, but I started wondering.
What if I renamed all the pages ".asp" instead of ".php"?
Basically, the question I'm asking is: how do hackers target PHP sites? Is there any other means to guess that a page is generated by PHP except for its suffix? Are there any HTML formatting rules that give away the CGI language in which this page was generated?
Or do hackers just slam random pages with well-known GET and POST exploits and see what happens?
I spent that last few days revamping a Web site and I took this opportunity to learn PHP, which has been an interesting experience.
This Web site contains about a thousand different HTML pages which I wanted to store in a database in order to make it easier to browse. My first task was therefore to scrape this HTML in order to extract its meaningful content and then to store into a database.
When I started this Web site six years ago, I had no idea I would ever need to do something like this but I still followed the convention of surrounding the information of importance with <span> tags. This turned out to be of critical importance. I wrote a short Ruby script that did the parsing and extracted the data into a canonical format that I later used as the central repository from which to populate the database.
The next step was to set up Apache and MySQL to my liking, which turned out to be a little more challenging than I had anticipated, because what I have access to on my development machine is different from what my ISP lets me modify. But I'll save that for a future entry if there's interest and I'll focus on PHP for now.
Picking PHP was a no-brainer. First because it is supported by my ISP but also because I had always wanted to learn it and find out what all the buzz was about. I expected the experience to be painless and... surprisingly, it was. Way beyond my expectations.
Here are a few thoughts from the perspective of a Java programmer who has been heavily exposed to J2EE for almost five years now. Since these reflexions are based on a PHP experience that is hardly just a few days old, it will most likely contain inaccuracies that you should feel free to point out in the comments.
PHP is a very simple imperative language with an impressive amount of libraries. Even though it possesses a few object-oriented attributes, I chose to ignore this aspect of the language in order to see what the code would look like if I didn't try to be too fancy, a habit that's shockingly hard to shake off after so many years of J2EE work.
PHP's main strength is its very regular syntax and a few details that make it extremely well suited for the Web, among which:
Not surprisingly, developing with PHP is very similar to JSP: you end up concatenating pieces of static HTML with dynamic PHP and this speeds up prototyping quite a bit. The problem is that once it works, you tend to think twice before refactoring it because errors with missing or extra delimiters are quite common, so in order to make it easy to debug, make sure you set display_errors = true in your php.ini.
There are two PHP idiosyncrasies that Java programmers will most likely trip upon:
This first point was actually pretty easy to get used to, but globals still tricks me now and then. For example:
$URL = "http://a.com";
function foo() {
echo $URL;
}
will print an empty string. Yup, not even an error (maybe this is configurable in php.ini, I didn't check). The correct code is:
$URL = "http://a.com";
function foo() {
globals $URL;
echo $URL;
}
This idiom will look familiar to those of you who used to program in TCL, which had even more nebulous scoping rules.
Another thing I found out the hard way is that PHP doesn't have any notion of name space, so it took me quite a while to figure out why the following code didn't work:
function log($msg) {
echo "[LOG] $msg";
}
The reason is that this function collides with the log function from the standard library and that not only does PHP decide to favor the other one, it also won't let you know of such a collision. This was a clear message to me that I should invent my own namespace, and I therefore decided to prefix all my methods with "cb" (I'm still unclear on which style is the best: cbConnectToDataBase() or cb_connectToDataBase()).
In the next installment, I will discuss the PHP MySQL API and how fighting ten years of good software and OO practices are hard to shake off, even though they're not exactly easy to achieve with PHP.
I'm a fan of both vi and emacs (and Eclipse too), which I still use on a daily basis. I've been using emacs for more than fifteen years so I know it pretty much inside and out and I fall back to it to edit anything that is not Java and more than a few dozen lines. This posting about vi reminded me that in terms of macros, emacs still has a formidable edge over vi (and pretty much any editor I can think of, actually).
But instead of empty words, here is a concrete task I had to do recently.
I have a bunch of HTML files named 100.html, 101.html... 199.html. They all contain something that looks like this:
<span class="number">
100
</span>
...
<span class="author">
Buffy Sommers
</span>
...
<span class="text">
Arbitrary HTML
that can span several lines.
</span>
I want to extract the content of the span tags and put them into a file in a canonical format (eventually a .ddl file for insertion in a database, but it doesn't really matter).
Emacs allowed me to accomplish this task within a few minutes with macros. Can you think of another tool that will let you do that? (without writing a script, which takes more than a few minutes anyway).
I was going to post a comment on Dion's blog about his entry on Maven when I realized that Mike posted it for me...
In short, ant's <import> and <macrodef> are absolute life savers. They have brought a lot of sanity into my build files, which I thought were already pretty lean and mean:
These rules of thumbs coupled with the following simple guidelines:
... give me a feeling of empowerment and control over my infrastructure.
Another important point is that I only need to know two languages to find my way in ant (Java and ant's XML) while Maven requires me to learn four different languages: Java, ant's XML, Maven's XML and... gulp... Jelly.
I still like the idea behind Maven but even today, when I see that the same criticisms we were hearing two years ago still crop up on a regular basis, it doesn't make me very confident on my ability to diagnose Maven meltdowns.
I am happy to announce the availability of TestNG 2.1. Some of the new features include:
A special thanks to Alexandru Popescu who has pulled all-nighters to make this release happen!
We have an exciting list of new features lined up for our next version, among which a plug-in API, but in the meantime, enjoy TestNG 2.1,
I am happy to announce that the EJB3 Early Draft 2 is now available from Sun's Web site. As usual, please send your comments to the EJB3 feedback alias.
Martin spotted a few inconsistencies in the new Generic Collections and wonders why...
But a couple of aspects of the new generics-enabled collections framework annoy me. For example, the Collections interface is declared as Collection<E> and the add method, for example, is correctly declared as:
boolean add(E object)
So I cannot help but wonder why, then, is remove declared as:
boolean remove(Object object)
and
Similarly, the toArray() method should be declared as follows:
E[] toArray()
Instead of:
Object[] toArray()
Rest assured, this is not an oversight, these methods were designed this way for two very good reasons.
Can you see why?
Here are hints if you get stumped (in white):

I just finished watching the first season of BattleStar Galactica 2003 and all I can say is... wow.
I wasn't very impressed with the four-hour pilot that aired on SciFi last year, but I decided I was intrigued enough to try and watch the show when it could come out. The problem is... it never did. For some reason, Sky started showing it in September last year and it took a few more months for it to appear on SciFi, where it is airing as we speak.
The screenwriters of BattleStar Galactica 2003 have taken a lot of liberties with the original characters and storyline, but if you can live with that, the show has some tremendous assets and it will keep you wanting for more week after week.
Don't be fooled by the first episodes and hang on. The new Galactica is very stylishly slow and is casting a lot of parallel plotlines that seem to be unrelated or even hard to follow. It only gets better week after week, until the thirteenth episode where you will find some answers at the price of one of the best cliffhangers I have seen in a long time.
I can't wait for season two.
In a recent talk about .Net, James Gosling made the following comment:
Microsoft’s decision to support C and C++ in the common language runtime in .NET one of the "biggest and most offensive mistakes that they could have made".
More specifically, James is referring to .Net's ability to support both "managed" and "unsafe" code, and he is clearly showing he doesn't really understand the issues at stake: failing to support unsafe code would make .Net unable to run legacy code, which is of course unacceptable for Microsoft and millions of Windows users.
James is also conveniently forgetting that his employer's business is based on an operating system, which is written in... C++.
The difference being that Solaris is written 100% in this "unsafe code" that James despises so much, whereas Microsoft is busy progressively replacing unsafe code with managed one in Windows.
It's sad that even somebody like James seems to forget that at the end of the day, good code comes from programmers, not languages.
As some of you may have noticed, I have now installed CAPTCHA protection for comments on my weblog. It was made necessary by the most recent of onslaught link spam that I just received. MT-BlackList has worked really well for me ever since I installed it and until yesterday, the previous massive spam attack I received happened a month ago.
For some reason, MT-BlackList didn't stop yesterday's attack, which resulted in one hundred (yes, exactly one hundred) link spam comments to be posted across thirty blog entries.
What's comforting is that despite the sophistication of spammers, they still tend to always include some common pattern in their spam entries (IP, email address, top-level domain, etc...) which makes it easy for me to obliterate their entire deed of evil in less than a minute. Still, this creates a certain amount of stress on my servers and, as a matter of principle, I want to make their life as hard as possible.
So I installed a package called SCode, which is a CAPTCHA plug-in for Movable Type. It's quite simple and uses a Perl library to display numbers in a picture. There are much more complicated solutions (involving not only digits but letters of various fonts warped by random transforms while still identifiable by humans), but for now, I'd like to see if this simple system will be sufficient to foil comment link spam on my blog.
The installation was pretty easy but it still required me to edit Movable Type's source code directly, which is certainly a very unfriendly way to provide a plug-in. I don't know if it's due to the its implementation or Perl, but the concept of plug-in in Movable Type is laughably primitive. We are definitely spoiled in Java land.
A user recently submitted his problem on the TestNG mailing-list: he needed to send asynchronous messages (this part hardly ever failed) and then wanted to use TestNG to make sure that the response to these messages was well received.
As I was considering adding asynchronous support to TestNG, it occurred to me that it was actually very easy to achieve:
private boolean m_success = false;
@Configuration(beforeTestClass = true)
public void sendMessage() {
// send the message, specify the callback
}
private void callback() {
// if we receive the correct result, m_success = true
}
@Test(timeOut = 10000)
public void waitForAnswer() {
while (! m_success) {
Thread.sleep(1000);
}
}
In this test, the message is sent as part of the initialization of the test with @Configuration, guaranteeing that this code will be executed before the test methods are invoked (and since we are specifying beforeTestClass = true, this code will only be executed once: when an instance of the test class is created).
After this, TestNG will invoke the waitForAnswer() test method which will be doing some partially busy wait (tthis is just for clarity: messaging systems typically give you better ways to wait for the reception of a message). The loop will exit as soon as the callback has received the right message, but in order not to block TestNG forever, we specify a time-out in the @Test annotation.
This code can be adapted to more sophisticated needs:
@Test(groups = { "send" })
public void sendMessage() {
// send the message, specify the callback
}
@Test(timeOut = 10000, dependsOnGroups = { "send" })
public void waitForAnswer() {
while (! m_success) {
Thread.sleep(1000);
}
}
The difference with the code above is that now that sendMessage() is a
@Test method, it will be included in the final report.
@Test(timeOut = 10000, invocationCount = 1000, successPercentage = 98)
public void waitForAnswer() {
while (! m_success) {
Thread.sleep(1000);
}
}
which instructs TestNG to invoke this method a thousand times, but to
consider the overall test passed even if only 98% of them succeed (of
course, in order for this test to work, you should invoke sendMessage() a
thousand times as well).