Email tips

Email tips which I thought were common knowledge.

Avoiding embarrassing forwards

Problem: You receive an email (say from a customer), you pass it along internally with a few comments and… you accidentally copy the author of the email with details they were not meant to read.

Solution: Instead of “Reply to all”, adding your team in Cc and removing the customer (the part a lot of people forget), forward the whole email and then add your comments. It’s much easier to avoid mistakes when you have to populate an empty To: box than when you have to edit an existing list of To's, Cc's and Bcc's.

Don’t send too soon

Problem: writing an email that’s not ready to be sent yet.

You need to write a detailed email which is not supposed to be sent just yet (for example because you are describing something you’re still working on). How do you make sure that you don’t send the email until the time is right?

Solution: Write it and save it as a draft but don’t enter anything in the To: box. Do this last.

Say no to ugly emails

Problem: You want your emails to look nice.

My emails routinely contain code or shell output, and as much I love Gmail, its abilities to format are pathetic, both from a user interaction standpoint (why do I need three clicks to indent text to the right?) and from a theme standpoint. For example:

In order to get this ugly rendition of code, I had to indent the snippet manually and then change its font. And it still looks terrible.

Solution: Use Markdown Here, a Chrome extension which not only allows you to format your code in Markdown but also uses some pretty nice CSS.

You only need a few backticks and, optionally, specify the language:

```java
public class SlidingWindowMap {
  public SlidingWindowMap(Set keys, int maxCount, long periodMs) {
    // ...
  }
 
  /**
   * @return a key that has been used less than `maxCount` times during the
   * past `periodMs` milliseconds or null if no such key exists.
   */
  public String getNextKey() {
    // Your brilliant Solution goes here
  }
 
}
```

Click the Markdown button and voilà:

Your turn now, what email tricks do you use which you think very few people know?

Bringing down servers, one character at a time

Last week, I mentioned a bug that I caused and a few people asked me details about it, so here is what happened.

The code in question is parsing a calendar and processing meetings. Before showing the buggy code, here is what a shortened version of the fix ended up looking like:

while (maxDays < 7 && (eventCount < 20) || attendeeCount < 40) {
    // process meetings
    // increment maxDays whenever we cross a day boundary
    // possibly increment eventCount or attendeeCount based on the meeting
}

The bug went unnoticed in production for a few days until suddenly, the servers started pinning all their CPU's and shortly thereafter, depleting the memory of each instance. A restart was the only remedy.

This code runs a loop through someone's calendar and stops until we've gathered enough information. The maxDays variable is what I call a "safety valve", a condition used to guarantees that the code doesn't run amok in case its normal processing flow never finishes. And yet, the crash seems to indicate that the code was doing exactly that.

With this introduction out of the way, here is now the line that caused the bug:

Actually... you've already seen it, it's the code I pasted above. Did you spot the problem?

That's right, a simple set of parentheses in the wrong place. Because they are surrounding the wrong expression, they completely defeat the purpose of the safety valve, which only triggers for people with empty calendars (explaining why it took some time to trigger that bug).

Two characters to bring down an entire server...

Can we learn anything about this bug, besides the fact that testing needs to cover edge cases? A static analysis tool can't be of much help here since we're completely in the realm of runtime behavior, but is there some way that an automated tool could have warned me about the flaw of this termination expression?

Branching logic

How does the following code make you feel:

  runSomeLogic(account, person, true, false);

?

Me? Annoyed.

I wish we could do away completely with this kind of code but I’ve come to the conclusion that such calls are unavoidable for a simple reason:

  private void complexStuff() {
    // ... gnarly code here...
    if (some condition) {
      doA
    } else {
      doB
    }
    // ... more gnarly code...
  }

Accepting branching parameters in the signature is a good way to minimize the amount of duplication that you will write, however, this doesn’t mean that you need to expose your callers to this implementation detail, or at least, that you can do it in a safer way than using naked booleans.

There is a whole spectrum to cover such cases, and the example above sits at one end of it. Let’s see if we can enumerate the alternatives.

Let’s start with the lazy way, which doesn’t require any code change:

  runSomeLogic(account, person,
        true /* use memcache */,
        false /* no security check */);

This gives me a better sense of what the code is doing but it’s still error prone since comments are easily forgotten. Can we get the compiler to help us?

  /** Never call this, call the overloaded helper methods */
  private void runSomeLogic(Account account, Person person, boolean useMemcache,
      boolean securityCheck) { ... }

  public void runSomeLogicNoCacheSecurity(Account account, Person person) {
    runSomeLogic(account, person, false, true);
  }

  public void runSomeLogicCacheSecurity(Account account, Person person) {
    runSomeLogic(account, person, true, true);
  }

  // ... you get the idea

A bit better and the compiler will keep us honest, but you can see the combinatorial explosion here: each new boolean forces you to multiply the number of methods by two. Not really scalable. One way to make it scalable is to prevent more than one boolean parameter, but this is really just kicking the can down the road.

Maybe booleans are the problems, how about something more descriptive?

  enum CacheInfo {
    USE,
    DONT_USE
  }

  enum SecurityInfo {
    USE,
    DONT_USE
  }

  runSomeLogic(account, person, CacheInfo.USE, SecurityInfo.DONT_USE);

This looks somewhat better: it’s both enforced by the compiler and readable, although maybe we can improve on that by creating one big enum instead of plenty of tiny ones:

  enum Use {
    MEMCACHE,
    SECURITY
  }

  enum DontUse {
    MEMCACHE,
    SECURITY
  }

  runSomeLogic(account, person, Use.MEMCACHE, DontUse.SECURITY);

The cost is two uber enums that everybody needs to import but the code reads more fluently. The downside of this approach is that we’ve weakened static guarantees considerably: it’s now possible to pass a Use.MEMCACHE where a security configuration setting is expected.

Can you think of other ways to approach this problem? (all languages welcome)

Flexible configuration with Guice

There are quite a few configuration libraries available in Java, such as this one available from Apache Commons, and they usually follow a very similar pattern: they parse a variety of configuration files and in the end, give you a Property or Map like structure where you can query your values:

Double double = config.getDouble("number");
Integer integer = config.getInteger("number");

I have always been unsatisfied with this approach for a couple of reasons:

  • A lot of boiler plate to retrieve these parameters.
  • Having to share the whole configuration object even if I only need one parameter from it.
  • It’s very easy to misspell a property and received incorrect values.

A while ago, I was reading the Guice documentation and I came across a paragraph that made me realize that maybe, we could do better. Here is the relevant excerpt:

Guice supports binding annotations that have attribute values. In the rare case that you need such an annotation:

Create the annotation @interface.
Create a class that implements the annotation interface. Follow the guidelines for equals() and hashCode() specified in the Annotation Javadoc. Pass an instance of this to the annotatedWith() binding clause.

I thought that using this technique might be exactly what I needed to create a smarter configuration framework, even though I had different plans than using this trick with the annotatedWith method, as suggested by this paragraph. The relevance of this snippet will become clear later, so let’s start with the goals.

Objectives

I want to:

  • Be able to inject individual configuration values anyhere in my code base and I want this to be type safe. No @Named or other string-based lookup.
  • Have a canonical list of all the properties available to the application, with their full type, default value, documentation and leaving the door open for improvements (e.g. is this option mandatory or optional, detecting when some properties are not used anywhere, deprecation, aliasing, etc…).

I don’t care much about the front end: how these properties get gathered is not relevant to this framework, they can come from XML, JSON, the network, a database, and they can have arbitrarily complex resolution and overriding rules, let’s save this for a future post. The input of this framework is a Map of properties and I take it from there.

By the time we’re done, we will be able to do something like this:

    # Some property file
    host=foo.com
    port=1234

Using these configuration values in your code:

public class A {
    @Inject
    @Prop(Property.HOST)
    private String host;

    @Inject
    @Prop(Property.PORT)
    private Integer port;

    // ...
}

Implementation

The definition of the Prop annotation is trivial:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
  Property value();
}

Property is an enum that captures all the information necessary for all your properties. In our case:

public enum Property {
    HOST("host",
        "The host name",
        new TypeLiteral<String>() {},
        "foo.com"),

    PORT("port",
        "The port",
        new TypeLiteral<Integer>() {},
        1234);
}

This enum contains the string name of the property, a description, its default value and its type. Note that this type is a TypeLiteral, so we can even offer properties that have generic types that would otherwise be erased, a trick that comes in handy to inject caches or other generic collections. Obviously, you can have additional parameters as you see fit (e.g. “boolean deprecated“).

The next step is to tie all the properties that we parsed as input — we’ll use a Map called "allProps" — into our module so that Guice knows how to inject them.

In order to do this, we iterate all these properties and bind them to their own provider. Because we are using typed names, note the use of Key.get from the Guice API, which lets us specifically target each property with a specific annotation:

    for (Property prop : Property.values()) {
        Object value = PropertyConverters.getValue(prop.getType(), prop, allProps.asMap());
        binder.bind(Key.get(prop.getType(), new PropImpl(prop)))
                .toProvider(new PropertyProvider(prop, value));
    }

There are three classes in this piece of code that I haven’t explained yet. The first one is PropertyConverters, which simply reads the string version of the property and converts it to a Java type. The second one is PropertyProvider, a trivial Guice provider:

public class PropertyProvider implements Provider {
    private final T value;
    private final Property property;

    public PropertyProvider(Property property, T value) {
        this.property = property;
        this.value = value;
    }

    @Override
    public T get() {
        return value;
    }
}

PropImpl is more tricky and also the one thing that has always prevented me from implementing such a framework, until I came across this obscure tidbit of the Guice documentation quoted above. In order to understand the necessity of its existence, we need to understand how Guice’s Key.get() works. Guice uses this class to translate a type into a unique key that it can use to inject the correct value. The important part here is to notice that not only does this method work with both Class and TypeLiteral (which we are using), but it can also be given a specific annotation. This annotation can be @Named, which I’m not a big fan of because it’s a string, so susceptible to typos, or a real annotation, which is what we want. However, annotations are special beasts in Java and you can’t get an instance of them just like that.

This is where the trick mentioned at the top of this article comes into play: Java actually allows you to implement an annotation with a regular class. The implementation turns out to be fairly trivial, the difficulty was realizing that this was possible at all.

Now that we have all this in place, let’s back track and dissect how the magic happens:

    @Inject
    @Prop(Property.HOST)
    private String host;

When Guice encounters this injection point, it looks into its binders and it finds multiple bindings for Strings. However, because they have all been bound with a Key, the key is actually a pair: (String, a Prop). In this case, it will look up the pair String, Property.HOST and it will find a provider there. This provider was instantiated with the value found in the property file, so it knows what value to return.

Generalizing

Once I had the basic logic in place, I wondered if I could turn this mini framework into a library so that others could use it. The only missing piece would be to allow the specification of a more general Prop annotation. In the example above, this annotation has a value of type Property, which is specific to my application:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
    Property value();
}

In order to make this more general, I need to make this attribute return an enum instead of my own:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
    Enum value();
}

Unfortunately, this is not legal Java, because according to the JLS section 8.9, Enum and its generic variants are not enum types, something that Josh Bloch confirmed, to my consternation.

Therefore, this cannot be turned into a library, so if you are interested in using it in your project, you will have to copy the source and make a few modifications to adjust it to your needs, starting by having Prop#value have the type of the enum that captures your configuration.

You can find a small proof of concept here, which I hope you’ll find useful.

Note: this is a copy of the article I posted on our work blog.

The time that never was

I spent some time tracking down a funny bug yesterday, which suddenly starting clogging our logs with the following error message:

org.joda.time.IllegalFieldValueException: Value 22 for hourOfDay is not supported: Illegal instant due to time zone offset transition

If there are two things I have learned about Joda-Time these past years, it’s that 1) it hardly ever fails and 2) when it does, it’s for a good reason. So I took a look at the code causing the exception and I found this:

max = eventLocalTime.withTime(22, 0, 0, 0);

I see the “22″ value but this code has been working flawlessly for weeks, what is suddenly causing it to fail? I fired the debugger, reproduced the problem and inspected the eventLocalTime variable. Something felt a bit unusual about its value:

2013-03-30T22:32:00.000 (America/Godthab)

I won’t claim that I know all the different time zones in existence in North America, but “America/Gothab” is certainly new to me. A quick Google search reveals that this time zone is in Greenland. Alright, so we have users in Greenland, but why would this cause a problem?

Reading this page again, something strikes me:

Next Scheduled DST Transition
DST will begin on Sat 30-Mar-2013 at 10:00:00 P.M. when local clocks are set forward 1 hour

and then everything falls into place.

The variable eventLocalTime is perfectly legitimate but today is March 29th, and adding 22 hours to it puts it into a nonexistent time, which happens whenever a time zone transitions to or from Daylight Savings Time. By adding twenty-two hours to the variable, we were trying to create a date “March 30th, 10pm”, which doesn’t exist because the clock in this time zone skips to 11pm. In the US, this nonexistent time period is usually between 2am and 3am, but for some reason, Greenland does this switch at 10pm.

The fix ended up looking as follows (happy to hear if there is a better solution):

try {
    max = eventLocalTime.withTime(22, 0, 0, 0);
} catch (IllegalFieldValueException e) {
    max = eventLocalTime.withTime(23, 0, 0, 0);
}

I am so happy that Joda will finally become standard with Java 8, and I have to tip my hat to Stephen, again, for his dedication to the terribly hard problem of date management.

If you have any funny bug to share, please go ahead and do so in the comments!

Tags:

A small step for Jebediah, a giant step for Kerbinkind

This is Kerbal Space Program, a sandbox space simulator created by a small team of passionate space programmers. I never really thought that designing rocket ships an aligning orbits could be fun, but I already spent more than thirty hours trying to launch my Kerbals into space (and killing most of them in the process). This turned out to be much more entertaining than I thought.

The game takes place on an imaginary planet called Kerbin. Kerbin looks like the earth and the planetary system it’s located in it has a lot of similarities with our own. As of the current version (0.18, $23, Windows, Mac and Linux), the game allows you to design either rocket ships or airplanes (you can send these to planets as well). Then you launch them and try to follow the plan you decided on.

The screen shot above is Jebediah Kerman, proudly taking his first steps on Mün. It’s the very first successful landing I ever did and it took me about fifty attempts (and countless fiery explosions) to reach this goal. I won’t deny that it was a lot of fun to get there, though, and I learned tons about space travel and ballistics.

The process of mastering the game follows a pretty clear path and while it doesn’t look that there is a big learning distance between your first orbit and your first successful landing, you will soon realize that you need to perfect a lot of intermediary steps in-between. And along the way, you will learn about ship design, the pros and cons of liquid and solid fuel, the compromises of taking a lot of fuel with you, how to launch a ship on a circular orbit, perfecting that orbit, learning how to use and interpret the navball, learning the maneuvers, when and how to perform gravity turns, fuel efficiency, choosing the optimal transfer path between planets, gravity slingshots, controlling your speed to land without exploding, etc…

The game is still in the very early stages and it’s hard not to draw a parallel with the early Minecraft days, where even with so few features, the game has such a strong appeal on imagination that it’s already attracting a very strong and devout following. Take a look at the Kerbal Space Program’s subreddit to get a feel for what people are doing with the game even at such an early stage. I can’t wait to see what it will be like once it reaches 1.0. The developers have managed to take what appears to be a very dry and challenging topic and turn it into a fun and challenging game which remains very approachable. And while you can take a very hard science approach to it, you can also decide to enjoy what it has to offer by simply learning what periapsis and apoapsis are, stopping there and focusing on the flying after that.

As for myself, I still have a lot to do. First, I need to work on fuel efficiency, because even if the Mün landing above was successful, my spaceship is now completely out of fuel (don’t tell Jebediah, he looks so happy). After this, the next step will be to try to land on the more distant planets.

Ad astra!

Answer to the “School” challenge

I’m happy to see mostly correct answers to the School coding challenge.

To make things more interesting, let’s start by looking at a naïve (and wrong) solution:

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((name == null) ? 0 : name.hashCode());
    result = prime * result + ((nickname == null) ? 0 : nickname.hashCode());
    return result;
  }

  @Override
  public boolean equals(Object obj) {
    // ... details omitted
    return other.nickname.equals(nickname) || other.name.equals(name);
  }

Let’s create two School objects, make sure they are equal, add them to a set and then display this set:

    School school1 = new School("University of Pennsylvania", "upenn");
    School school2 = new School("University of Pennsylvania", "Penn State");
    System.out.println("Equals: " + school1.equals(school2));
    Set<School> schools = Sets.newHashSet();
    schools.add(school1);
    schools.add(school2);
    System.out.println("Set: " + schools);

The output:

Equals: true
Set: [com.beust.School@d8e612c6, com.beust.School@7086780a]

This is obviously wrong and caused by the fact that this implementation breaks several laws, among which: “if two objects are equal, their hashcode should be equal as well”. However, this is clearly not the case here since the two objects instantiated in the main method are equal but their hash code are different, since it’s calculated from both the name and nickname. The consequences of such an implementation are disastrous in more ways than one, starting with the fact that your objects cannot reliably be stored in nor retrieved from identity-based collections (e.g. sets and maps).

The only way out of this would be to return a variable hashcode, but this doesn’t make sense since we wouldn’t know what field to base that hash code on.

Most commenters came up with the idea of always returning the same value for hashCode(), regardless of the name and nickname values. This approach is actually correct but it comes at the cost of losing all running time benefits offered by sets and maps, such as constant time retrievals. Instead, Hashmap#get and Set#contains are now linear operations. That’s a pretty bad prank to pull on your coworkers when their collections can contain millions of objects.

So, what’s the best way to solve this problem?

I’m not quite sure. There are several approaches, all with their pros and cons.

The first idea would be to not override equals and hashCode, using their default implementation defined on Object. This implementation (simple reference equality test) is guaranteed to be wrong in most cases, but since you can’t write a correct one, maybe the best approach is to do nothing.

Another approach would be to throw an OperationNotSupportedException in equals/hashCode. The implementation is still wrong but at least, your program will crash loudly if someone attempts to put these objects in an identity-based collection.

Ultimately, you really want to get rid of the fuzzy matching performed in the equals method, so the best approach is probably to perform a first pass on all your School objects and unify all those that are equal under the same id. After this, you can implement equals/hashCode based on this id field.

Coding challenge (“light” edition)

This coding challenge is a bit different from the previous one ([1] [2]) because it is shorter and also a bit more closely tied to Java.

A School has either a name (“Stanford”) or a nickname (“UCSD”) or both.

Your task is to write equals() and hashCode() methods for the School class that satisfy the following requirement: two Schools are identical if they either have the same name or the same nickname.

A humble suggestion to protect your important information

When you create a new account on a web site, it’s pretty much the norm to be asked to enter an email address. Why aren’t there more (any?) web sites that ask me for an additional email address that will be used exclusively to send me password resets?

With such a simple extra step, my account has become impossibly hard to crack. Not just because my “safe email address” will, obviously, be protected by two-step authentication or other similar mechanism, but more importantly because the hackers have no idea what that email address is: it’s not used to send any email, it only receives emails sporadically, and the only way to find out what it is is to hack into the bank’s servers (I’m pretty sure that hackers able to do that will find far more interesting accounts to empty than mine).

Another benefit is that I don’t need to burden my regular email address with two-step authentication, which would be a hassle since I log into that account so often from so different places.

So… What am I missing? Why isn’t this practice more widespread?

No silver bullet: reloaded

I had to read the recent article “Functional Programming Basics”, by Robert Martin, several times to make sure that I wasn’t missing any deeper point, but every time, I just became more and more concerned by the sloppiness of the arguments presented.

Here are a few thoughts.

First, it’s almost certainly true that functional programming is the next big thing.

We’ll see about that. This has been predicted for several decades and it still hasn’t happened. Some say that it will never happen, others say it’s already happened and more moderate people just observe that quite a few core concepts of functional programming have been leaking into mainstream languages for a while, so in some way, functional programming is slowly percolating through the industry.

Functional programming is programming without assignment statements.

This is a pretty silly statement since functional programming languages have assignments. Maybe the emphasis should be more on referential transparency or perhaps “variables can never be assigned more than once” would be a more accurate description of one of the important tenets of functional programming.

Let’s look at a functional program for the squares of integers. We’ll use the language Clojure for this, though the concepts we’re going to explore work the same in any functional language.

(take 25 (squares-of (integers)))

I find the use of Clojure (or more generally, Lisp) to illustrate his points a bit puzzling because while the question of whether functional programming will play a big part in the software industry’s future is still up for debate, it seems pretty clear to me that 1) object orientation and 2) static typing are two concepts that have clearly established themselves as powerhouse ideas that have an established track record and a bright future ahead of them. Scala or Kotlin would have been a better choice, especially they are not that much more verbose on such a trivial example:

// Kotlin
ints.map { it * it }.take(25)

Read that sentence again; because I did something important there. I took the three separate definitions of the functions that I gave you in the preceding paragraph and combined them into a single sentence. That’s called: (are you ready for the buzzword?) Referential Transparency. [cue: Fanfare, glitter, ticker tape, cheering crowds]

This is where I start being a bit confused by Robert’s point, because if anything, the code he just showed is illustrating a lot more composition than referential transparency.

If you are trying to sell functional programming as a big thing in order to prop up your consultant business, you should probably skip the trivial stuff and start with the hard problems right away, such as preserving referential transparency and immutability, demonstrating the power of composition while accessing a database, returning JSON data from a servlet and have another servlet modify that same database. Do this with a functional programming language and with snippets of code that show clear advantages over today’s Java and C# based solutions and you have yourself a market.

Honestly, we programmers can barely get two Java threads to cooperate.

Not really. Hundreds of thousands of Java programmers write multithreaded code every day and they are doing just fine, because the intricacies of parallelism are abstracted away from them by the frameworks they rely on to get their job done. I’m not talking just about complex Java EE or Spring-based containers but even the simple Tomcat or Jetty servers. More advanced programmers dabble with the amazing java.util.concurrent classes but even those never really have to think much about the hard problems that heavy paralellism involve, because domain gurus such as Brian Goetz and Doug Lea have already done all the hard work for them.

I continue to be wholly unconvinced by the claim that “the new era of multi cores is going to wreak havoc on our code bases unless we adopt a functional programming language”. This claim has two sides, both equally bogus.

As I mentioned above, the fact that we are now running our programs on multiple cores instead of multiple CPU’s brings very little practical difference for anyone writing multithreaded code. What processor/core/green thread/lightweight process your thread will be riding should have zero impact on how you structure your code. If your code was multithread safe on “n CPU’s / 1 core”, it will be equally multithread safe on “1 CPU / n cores” or even “n CPU’s / m cores”. The only major difference is that it now becomes possible to test for actual parallelism (not just concurrency) on inexpensive developer boxes.

The second part of the claim, that only functional programming languages can save us from the impending multicore apocalypse, is equally dubious but I’ll save this for another post, although I’ll point out that traditional OO and imperative programming has shown an amazing power of adaptation and versatility for several decades now, so whatever the next big paradigm will be, it will have to bring significant improvements to the table.

More importantly, if you could peer into the computer’s memory and look at the memory locations used by my program, you’d find that those locations would be initialized as the program first used them, but then they would retain their values, unchanged, throughout the rest of the execution of the program. In other words, no new values would be assigned to those locations.

This is a bizarre claim that shows that Bob only has a passing familiarity with the internals of how a compiler or a VM works. Just because the observable effects of a program are consistent with immutable structures does not mean that the underlying memory is not being shifted around by the OS or by the abstraction layer standing between the source code and the CPU.

Moreover, the claim that immutable structures automatically lead to easy concurrency is not just naïve but plain incorrect: there are a lot of algorithms (such as merge sort) that are inherently not parallelizable.

Overall, this is a fairly unconvincing article that seems to miss most of the well recognized benefits of functional programming (composability remains the aspect that has the highest impact on my code on a daily basis) while touting advantages that range from poorly understood concepts to plain incorrect claims.