Archive for August, 2003

Fighting spam with spam

Paul Graham posted further
thoughts
about spam.  One of his recommendations is to have client
filters basically launching a distributed denial of service attack on the spammer’s Web
site:

So I’d like to suggest an additional
feature to those working on spam filters: a "punish" mode which, if turned on,
would retrieve whatever’s at the end of every url in a suspected spam n times,
where n could be set by the user.

While attractive, this idea comes with several problems of its own, the main
one being abuse.  Using this technique, it becomes relatively easy to
invoke a DDOS on certain sites.

Or does it?

Let’s suppose we live in a world where a reasonable number of email clients
have an "angry filter" built-in:  whenever this filter detects a spam that
has a URL in it, it will retrieve the said page a certain number of times (say,
ten).  By "reasonable number", I mean that there are enough of these
filters to trigger a massive denial of service attack if a spam is sent out. 
Considering the number of emails a typical spam involves (say, ten million) even
if only a small percentage of the receiving clients (say 1% = 100,000 machines)
has the software installed, this will result in about one million hits on the
spammer’s page.

If I wanted to abuse this system, I would basically have to turn myself into
a spammer.  The cost of the infrastructure is not too high (spamming
software, a CD with millions of email addresses, a lazy ISP ).  I also need
to compose an email message that will be flagged as spam by mail filters (I can
simply copy/paste an existing spam) and then include the URL of the victim’s Web
site in the message.

With that in mind, abuse does indeed seem easily to achieve.  Now, could
we work around this problem?

We can imagine making the "angry filter" smarter:  it would try to
relate the URL contained in the email with its content.  One way to do that
would be to pick the ten words in the email message that were computed as being
the highest probability by the Bayesian filter (things like "mortgage", "debt",
etc…) and then see if the URL has any connection to these words.  Either
by

  • Running some heuristic rules on the name of the URL itself.
     
  • Consulting a central database and see if the said URL has been flagged as
    a spam originator (the said database is of course going to receive a lot of
    hits, and the question of "who is this greater authority?" remains).
     
  • Connecting to the URL and parse its content before deciding further action
    (which probably defeats the purpose, although in this case, the angry filter
    might decide to limit its connection to the Web site to one instead of ten if
    it decides the spam is probably an abuse).

None of these options seem very effective to me and I have to say that
overall, the idea of fighting unwanted traffic with even more traffic doesn’t
strike me as the right thing to do, even if giving the spammers a taste of their
own medicine offers some strange sadistic appeal.

Maybe we could consider something more clever:  crawling the spammer’s
Web site in search of an order form and fill this form with bogus information
that the spammer will have to process and validate.  Once this information
is found, it could be uploaded to a central database so that other angry filters
can skip this step and directly proceed to the form.

If nine out of ten orders turn out to be bogus, the spammers’ operative costs
will make the act of spamming less interesting to them.

Any other thoughts?

Staying sharp

I recently received an email from a reader asking me for advice about
improving his developer skills.

I am not sure there is any straight answer to this question but let me throw
a few random ideas.

  • I have certainly found that reading (a lot of reading) is certainly a
    great way to accelerate your skills and to force your brain to expand its
    horizons.

    Reading Java books is one way of doing it but the most important thing to me
    is to try and vary the books I read as much as possible.
     

  • Studying other
    languages is also a fantastic and fascinating way of learning new concepts
    that change the way you think.  Somebody once said "learn a new language every
    year" and I can’t agree more.  And just like natural languages, the more
    languages you know, the easier it becomes to learn new ones.

    Things can get really interesting when you start "cross-pollinating": 
    mixing concepts read in several books.  For example, trying to apply
    concepts that are particular to a language (e.g. closures) to a language that
    doesn’t support them (e.g. Java).

    Also, don’t be afraid of marginal or complex languages.  I see this
    practice similar to mathematics:  it’s something that you study not
    because it will be directly applicable to you, but for all the wonderful
    benefits it will indirectly bring you.  For example, the complexity of
    C++ is daunting and you might not want to be involved in such a language as
    long as you don’t really need it, but that would be a mistake.  Pick a
    book by one of the gurus and look at the amazing things they can do just by
    tweaking the concept of templates (partial specialization, traits, etc…).

    Similarly, more abstract languages such as Dylan or CAML are quite confusing
    at first but they will reshape your way of thinking in interesting ways.

  • It is very rare to find "curious" colleagues.  What I mean by
    curious is simply what I just described:  people who are not only good at
    their job but who also like to explore other areas and discuss them.  If
    you happen to have somebody like that in your work environment, take every
    opportunity you can to have lunch or coffee with them.  There is nothing
    more exciting than two curious spirits bouncing off ideas.  Separately, you
    and him will spark interesting ideas, but if you put both together, the total
    knowledge will greatly exceed the sum of its individual parts.
     
  • Of course, all this would be useless if all you did is read and never practice. 
    At all my jobs, I have always saved some time every day to do something "on the
    side", something that doesn’t pertain to my work directly.   Working on side projects is definitely a way to sharpen your skills, especially
    if you can be involved in a project for a long time, which will allow you to go
    through all the cycles involved in the development process.  If you can do
    this, it’s worth it.  Don’t forget to have a life and enjoy your hobbies,
    though, this is a lot more important than it may seem.  If you are not
    balanced, you will quickly plateau at work, no matter how dedicated you are.

  • Finally, there are a lot of books I could recommend.  I try to maintain a
    list of books I read, feel free to take a
    look.

How do you guys "stay sharp"?

EJB 3.0

There is an

interesting thread
going on at TheServerSide about EJB 3.0.  Some
people have offered suggestions for the upcoming new specification, and one of
them was interceptors:

You need to be able to restrict access based on what values instance
properties hold. That is how real world systems work. Interceptors – or advices
- would be a good way of implementing this.

While I
initially responded than it would be interesting to have the
specification define interceptors for EJB’s formally, I am slowly changing my
mind.  The more I think about it, the more I believe it would be a bad idea
to add this to the EJB specification.  There are several reasons to this,
which I’ll examine in turn:

  1. Lack of use cases
  2. AOP

Lack of use cases

Interceptors have an undeniable "cool factor".  They make for great
demos and allow you to take peek inside what used to be seen as a magic black
box.  Over the years, I have talked to several customers using EJB
intensively and only a few of them asked for interceptors.  What was
interesting is that when I drilled down and asked them why exactly they wanted
interceptors, it always turned out that their needs could be met through
different means than EJB interceptors (some of them being flags that we already
provided and they didn’t know about, others being telling them about some EJB
lifecycle events they didn’t know about).

However, there were a few cases not covered by any of the above, which brings
me to my second point.

AOP

Since we released our AOP Framework for WebLogic, it’s become pretty clear to
us that AOP can be easily applied to J2EE in general and EJB in particular. 
If you haven’t looked at it yet, it is basically a set of pointcuts defined in
AspectJ that transparently ties into a server.  The interesting thing is
that there is no AOP support in any of the WebLogic Server versions we
support.  The package is totally external, and yet it is able to weave
itself into existing code.

And therein lies its power:  suddenly, all your J2EE applications become
not only potential targets for interception, but these interceptions can be
extremely sophisticated.  As sophisticated as the AOP implementation you
are using, actually (right now, we are using AspectJ).

With that in mind, the importance of specifying interception in the EJB
specification becomes very questionable.  Not only would the specification
be very narrow (it only applies to EJB’s), it would also be very restricted
since dealing only with the container part of the application.  Maybe I
should clarify that.

EJB code is typically made of three parts:

  • User code
  • Generated code
  • Container code

A specification of interceptors would only cover container code. 
Generated and container code will obviously vary depending on the server you are
using.

With that in mind, it’s pretty clear that an official interceptor
specification would fall very short of addressing a broad category of problems.

AOP allows you to address these three aspects, assuming that the framework
you are using supports both callee and caller pointcuts.

Conclusion

I think it would be a mistake to make interceptors part of the EJB
specification.  Something more general is needed, both in scope (it should
apply to J2EE and not just EJB) and conceptually (use aspect-oriented
programming on top of the existing technologies).

JSR 175 available

JSR 175 is now
available for community review
.

We (well, mostly Josh) have been working hard these past weeks to iron out
the last details, and there are still a few unresolved issues, but this draft is
very comprehensive and covers pretty much all the aspects involved in this JSR.

There is still time to provide feedback, so don’t hesitate.

"Inheritance considered evil" considered evil

You should always be very leery of any article that contains the words "considered evil" in its title.  This article, where Allen Holub is unconvincingly trying to make a case that inheritance is evil, is no exception.

There is a well-known phenomenon in the publishing industry pertaining to sensationalism. Whenever you want to write something that will get the attention of your readers, a simple technique is to pick a popular opinion and explain why everyone is wrong about it.

Holub has been using this trick in the past to get readership (like when he said that getters and setters were… uh, evil) but while it would be easy to attack this article on the form, let’s take a look at what he is actually saying.

Allen mentions the fragile base class problem as one of the reasons why you should avoid inheritance altogether. This is a very well-known problem in the OO circles and C++ programmers have been aware of it for more than ten years. The problem is… Allen doesn’t understand it.

To illustrate, Allen uses the example of a Stack class that uses a List as its underlying implementation and he notices that calling clear() on this object will clear only the base class and not the derived one. Fair enough, that’s a good point. But this is *not* the fragile base class problem. What’s interesting is that Allen does give the right description of this problem a few lines above:

Base classes are considered fragile because you can modify a base class in a seemingly safe way, but this new behavior, when inherited by the derived classes, might cause the derived classes to malfunction.

and then he proceeds to demonstrate something totally different.  Ahem.

What’s overall disturbing with this article, and with the way Holub approaches problems in general, is the tendency to make sweeping statements.  Allen demolishes inheritance by showing a very poor example of inheritance.  Someone who would implement a Stack by

  • Using a List as the underlying implementation
  • Inheriting List instead of delegating to it

is clearly a beginner in object-oriented programming.  Nothing that a good book won’t fix.  But no, instead of just observing this fact, Allen is trying to make a case that inheritance should be banned altogether.  As I read through his articles, it’s pretty clear to me that whatever he is trying to do, he is using the wrong language.  How about Lisp instead?

Some of his points are worth making, though, like the emphasis on interfaces.  Now, there is a valid fight and it is important to raise every programmer’s awareness to the importance of interface programming.

We all know that inheritance can be abused, but isn’t it so for every feature of every language on the planet?  Instead of giving in to simple sensationalism, how about studying the pros and cons of inheritance and trying to educate your readers objectively?

This kind of article would be understandable in a personal weblog, but I expect better from Javaworld.

First post

Welcome to my new weblog