Coding challenge: fast width/height of an image

I am trying to find out the width and height of an image (a URL) transferring as few bytes as possible. This coding challenge has two separate parts:

  • Finding the cheapest way to retrieve width and height.
  • Verifying that the solution is indeed cheap.

The first part is not that interesting since it’s a fairly well documented problem. A bit too documented, actually, since I found several solutions:

This last solution looks pretty efficient from a network perspective but I’m worried it might leave out a few corner cases in image handling. The first two solutions are probably battle tested and very robust but I have no idea how effective they are from a network standpoint (their Javadoc doesn’t describe the specifics of how they operate).

So… Do you have a better solution? And whether you do or don’t, how would you assess how many bytes these implementations transfer?

Chrome and passwords

An innocent blog post demonstrating that it’s trivial for anyone to see passwords in clear in Chrome is slowly building up into a whole scandal. The Chrome team was quick to respond but unfortunately, that answer simply poured oil on a fire that was already burning quite brightly.

Here are a few predictions:

  • As we speak, the Chrome team is having an emergency meeting to discuss the situation.
  • Within a few days, they will announce that they are going to fix the problem.
  • In a couple of weeks, the next version of Chrome will no longer display passwords in clear without additional action from the user.
  • In the months that follow, the other browsers will fall in line and implement a similar fix (Update: Safari already prompts you for your machine password before showing you its passwords, thanks Fabrizio for pointing this out).

I’m guessing Justin Chuh is probably regretting answering so hastily (you know you’re in hot water when Sir Tim Berners Lee calls your answer “disappointing”) and this might be a good illustration of being immersed into a domain for so many years that you start missing out on obvious things. We’ve all been guilty of this at some point, where our own expertise is reinforcing a flawed premise and that very expertise is preventing us from being critical of that very same premise.

The general idea is that once someone has physical access to a resource you are trying to secure, that resource is pretty much gone, so you shouldn’t spend too much time trying to address that scenario.

A common response to this is that once someone gains access to your computer, they can still log into your favorite web sites without knowing your passwords, but this is a straw man: the exposure of your passwords in clear text can be devastating since this knowledge can then be used to many more web sites. If you give me a password cookie, I can only log to so many web sites in a few minutes. If you give me your clear text password, I can spend hours back home trying to log in various web sites without worries.

The danger with the “physical access is not worth addressing” premise is that it’s tempting to do absolutely nothing to address this case (which is what most browsers do, Chrome is not the only culprit here) instead of doing at least a minimum. For example, maybe you have some cash in your home that you save for emergencies and you are wondering where to hide that money. Just because a burglar inside your house can steal whatever they want at that point doesn’t mean you shouldn’t at least make it a bit harder for them. Surely, hiding that cash inside socks in a drawer is a bit more secure than leaving it in plain sight on the kitchen table.

This added bit of security can be crucial for two reasons:

  • It raises the technical bar on the thief. With the current situation, someone who sits down in front of your unlocked workstation can know your passwords within ten seconds and just a few clicks. Hash and hide these passwords and suddenly, reading them is no longer accessible to a large part of the population.
  • Time is also a factor. When someone gains access to your computer, they might only have a few minutes and anything you can do to delay them (such as installing a hash decrypter) is that much time they don’t have to steal something else.

Whatever happens, I think we will all come out of this situation with more secure browsers.

Email tips

Email tips which I thought were common knowledge.

Avoiding embarrassing forwards

Problem: You receive an email (say from a customer), you pass it along internally with a few comments and… you accidentally copy the author of the email with details they were not meant to read.

Solution: Instead of “Reply to all”, adding your team in Cc and removing the customer (the part a lot of people forget), forward the whole email and then add your comments. It’s much easier to avoid mistakes when you have to populate an empty To: box than when you have to edit an existing list of To's, Cc's and Bcc's.

Don’t send too soon

Problem: writing an email that’s not ready to be sent yet.

You need to write a detailed email which is not supposed to be sent just yet (for example because you are describing something you’re still working on). How do you make sure that you don’t send the email until the time is right?

Solution: Write it and save it as a draft but don’t enter anything in the To: box. Do this last.

Say no to ugly emails

Problem: You want your emails to look nice.

My emails routinely contain code or shell output, and as much I love Gmail, its abilities to format are pathetic, both from a user interaction standpoint (why do I need three clicks to indent text to the right?) and from a theme standpoint. For example:

In order to get this ugly rendition of code, I had to indent the snippet manually and then change its font. And it still looks terrible.

Solution: Use Markdown Here, a Chrome extension which not only allows you to format your code in Markdown but also uses some pretty nice CSS.

You only need a few backticks and, optionally, specify the language:

```java
public class SlidingWindowMap {
  public SlidingWindowMap(Set keys, int maxCount, long periodMs) {
    // ...
  }
 
  /**
   * @return a key that has been used less than `maxCount` times during the
   * past `periodMs` milliseconds or null if no such key exists.
   */
  public String getNextKey() {
    // Your brilliant Solution goes here
  }
 
}
```

Click the Markdown button and voilà:

Your turn now, what email tricks do you use which you think very few people know?

Bringing down servers, one character at a time

Last week, I mentioned a bug that I caused and a few people asked me details about it, so here is what happened.

The code in question is parsing a calendar and processing meetings. Before showing the buggy code, here is what a shortened version of the fix ended up looking like:

while (maxDays < 7 && (eventCount < 20) || attendeeCount < 40) {
    // process meetings
    // increment maxDays whenever we cross a day boundary
    // possibly increment eventCount or attendeeCount based on the meeting
}

The bug went unnoticed in production for a few days until suddenly, the servers started pinning all their CPU's and shortly thereafter, depleting the memory of each instance. A restart was the only remedy.

This code runs a loop through someone's calendar and stops until we've gathered enough information. The maxDays variable is what I call a "safety valve", a condition used to guarantees that the code doesn't run amok in case its normal processing flow never finishes. And yet, the crash seems to indicate that the code was doing exactly that.

With this introduction out of the way, here is now the line that caused the bug:

Actually... you've already seen it, it's the code I pasted above. Did you spot the problem?

That's right, a simple set of parentheses in the wrong place. Because they are surrounding the wrong expression, they completely defeat the purpose of the safety valve, which only triggers for people with empty calendars (explaining why it took some time to trigger that bug).

Two characters to bring down an entire server...

Can we learn anything about this bug, besides the fact that testing needs to cover edge cases? A static analysis tool can't be of much help here since we're completely in the realm of runtime behavior, but is there some way that an automated tool could have warned me about the flaw of this termination expression?

Branching logic

How does the following code make you feel:

  runSomeLogic(account, person, true, false);

?

Me? Annoyed.

I wish we could do away completely with this kind of code but I’ve come to the conclusion that such calls are unavoidable for a simple reason:

  private void complexStuff() {
    // ... gnarly code here...
    if (some condition) {
      doA
    } else {
      doB
    }
    // ... more gnarly code...
  }

Accepting branching parameters in the signature is a good way to minimize the amount of duplication that you will write, however, this doesn’t mean that you need to expose your callers to this implementation detail, or at least, that you can do it in a safer way than using naked booleans.

There is a whole spectrum to cover such cases, and the example above sits at one end of it. Let’s see if we can enumerate the alternatives.

Let’s start with the lazy way, which doesn’t require any code change:

  runSomeLogic(account, person,
        true /* use memcache */,
        false /* no security check */);

This gives me a better sense of what the code is doing but it’s still error prone since comments are easily forgotten. Can we get the compiler to help us?

  /** Never call this, call the overloaded helper methods */
  private void runSomeLogic(Account account, Person person, boolean useMemcache,
      boolean securityCheck) { ... }

  public void runSomeLogicNoCacheSecurity(Account account, Person person) {
    runSomeLogic(account, person, false, true);
  }

  public void runSomeLogicCacheSecurity(Account account, Person person) {
    runSomeLogic(account, person, true, true);
  }

  // ... you get the idea

A bit better and the compiler will keep us honest, but you can see the combinatorial explosion here: each new boolean forces you to multiply the number of methods by two. Not really scalable. One way to make it scalable is to prevent more than one boolean parameter, but this is really just kicking the can down the road.

Maybe booleans are the problems, how about something more descriptive?

  enum CacheInfo {
    USE,
    DONT_USE
  }

  enum SecurityInfo {
    USE,
    DONT_USE
  }

  runSomeLogic(account, person, CacheInfo.USE, SecurityInfo.DONT_USE);

This looks somewhat better: it’s both enforced by the compiler and readable, although maybe we can improve on that by creating one big enum instead of plenty of tiny ones:

  enum Use {
    MEMCACHE,
    SECURITY
  }

  enum DontUse {
    MEMCACHE,
    SECURITY
  }

  runSomeLogic(account, person, Use.MEMCACHE, DontUse.SECURITY);

The cost is two uber enums that everybody needs to import but the code reads more fluently. The downside of this approach is that we’ve weakened static guarantees considerably: it’s now possible to pass a Use.MEMCACHE where a security configuration setting is expected.

Can you think of other ways to approach this problem? (all languages welcome)

Flexible configuration with Guice

There are quite a few configuration libraries available in Java, such as this one available from Apache Commons, and they usually follow a very similar pattern: they parse a variety of configuration files and in the end, give you a Property or Map like structure where you can query your values:

Double double = config.getDouble("number");
Integer integer = config.getInteger("number");

I have always been unsatisfied with this approach for a couple of reasons:

  • A lot of boiler plate to retrieve these parameters.
  • Having to share the whole configuration object even if I only need one parameter from it.
  • It’s very easy to misspell a property and received incorrect values.

A while ago, I was reading the Guice documentation and I came across a paragraph that made me realize that maybe, we could do better. Here is the relevant excerpt:

Guice supports binding annotations that have attribute values. In the rare case that you need such an annotation:

Create the annotation @interface.
Create a class that implements the annotation interface. Follow the guidelines for equals() and hashCode() specified in the Annotation Javadoc. Pass an instance of this to the annotatedWith() binding clause.

I thought that using this technique might be exactly what I needed to create a smarter configuration framework, even though I had different plans than using this trick with the annotatedWith method, as suggested by this paragraph. The relevance of this snippet will become clear later, so let’s start with the goals.

Objectives

I want to:

  • Be able to inject individual configuration values anyhere in my code base and I want this to be type safe. No @Named or other string-based lookup.
  • Have a canonical list of all the properties available to the application, with their full type, default value, documentation and leaving the door open for improvements (e.g. is this option mandatory or optional, detecting when some properties are not used anywhere, deprecation, aliasing, etc…).

I don’t care much about the front end: how these properties get gathered is not relevant to this framework, they can come from XML, JSON, the network, a database, and they can have arbitrarily complex resolution and overriding rules, let’s save this for a future post. The input of this framework is a Map of properties and I take it from there.

By the time we’re done, we will be able to do something like this:

    # Some property file
    host=foo.com
    port=1234

Using these configuration values in your code:

public class A {
    @Inject
    @Prop(Property.HOST)
    private String host;

    @Inject
    @Prop(Property.PORT)
    private Integer port;

    // ...
}

Implementation

The definition of the Prop annotation is trivial:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
  Property value();
}

Property is an enum that captures all the information necessary for all your properties. In our case:

public enum Property {
    HOST("host",
        "The host name",
        new TypeLiteral<String>() {},
        "foo.com"),

    PORT("port",
        "The port",
        new TypeLiteral<Integer>() {},
        1234);
}

This enum contains the string name of the property, a description, its default value and its type. Note that this type is a TypeLiteral, so we can even offer properties that have generic types that would otherwise be erased, a trick that comes in handy to inject caches or other generic collections. Obviously, you can have additional parameters as you see fit (e.g. “boolean deprecated“).

The next step is to tie all the properties that we parsed as input — we’ll use a Map called "allProps" — into our module so that Guice knows how to inject them.

In order to do this, we iterate all these properties and bind them to their own provider. Because we are using typed names, note the use of Key.get from the Guice API, which lets us specifically target each property with a specific annotation:

    for (Property prop : Property.values()) {
        Object value = PropertyConverters.getValue(prop.getType(), prop, allProps.asMap());
        binder.bind(Key.get(prop.getType(), new PropImpl(prop)))
                .toProvider(new PropertyProvider(prop, value));
    }

There are three classes in this piece of code that I haven’t explained yet. The first one is PropertyConverters, which simply reads the string version of the property and converts it to a Java type. The second one is PropertyProvider, a trivial Guice provider:

public class PropertyProvider implements Provider {
    private final T value;
    private final Property property;

    public PropertyProvider(Property property, T value) {
        this.property = property;
        this.value = value;
    }

    @Override
    public T get() {
        return value;
    }
}

PropImpl is more tricky and also the one thing that has always prevented me from implementing such a framework, until I came across this obscure tidbit of the Guice documentation quoted above. In order to understand the necessity of its existence, we need to understand how Guice’s Key.get() works. Guice uses this class to translate a type into a unique key that it can use to inject the correct value. The important part here is to notice that not only does this method work with both Class and TypeLiteral (which we are using), but it can also be given a specific annotation. This annotation can be @Named, which I’m not a big fan of because it’s a string, so susceptible to typos, or a real annotation, which is what we want. However, annotations are special beasts in Java and you can’t get an instance of them just like that.

This is where the trick mentioned at the top of this article comes into play: Java actually allows you to implement an annotation with a regular class. The implementation turns out to be fairly trivial, the difficulty was realizing that this was possible at all.

Now that we have all this in place, let’s back track and dissect how the magic happens:

    @Inject
    @Prop(Property.HOST)
    private String host;

When Guice encounters this injection point, it looks into its binders and it finds multiple bindings for Strings. However, because they have all been bound with a Key, the key is actually a pair: (String, a Prop). In this case, it will look up the pair String, Property.HOST and it will find a provider there. This provider was instantiated with the value found in the property file, so it knows what value to return.

Generalizing

Once I had the basic logic in place, I wondered if I could turn this mini framework into a library so that others could use it. The only missing piece would be to allow the specification of a more general Prop annotation. In the example above, this annotation has a value of type Property, which is specific to my application:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
    Property value();
}

In order to make this more general, I need to make this attribute return an enum instead of my own:

@Retention(RUNTIME)
@Target({ ElementType.FIELD, ElementType.PARAMETER })
@BindingAnnotation
public @interface Prop {
    Enum value();
}

Unfortunately, this is not legal Java, because according to the JLS section 8.9, Enum and its generic variants are not enum types, something that Josh Bloch confirmed, to my consternation.

Therefore, this cannot be turned into a library, so if you are interested in using it in your project, you will have to copy the source and make a few modifications to adjust it to your needs, starting by having Prop#value have the type of the enum that captures your configuration.

You can find a small proof of concept here, which I hope you’ll find useful.

Note: this is a copy of the article I posted on our work blog.

The time that never was

I spent some time tracking down a funny bug yesterday, which suddenly starting clogging our logs with the following error message:

org.joda.time.IllegalFieldValueException: Value 22 for hourOfDay is not supported: Illegal instant due to time zone offset transition

If there are two things I have learned about Joda-Time these past years, it’s that 1) it hardly ever fails and 2) when it does, it’s for a good reason. So I took a look at the code causing the exception and I found this:

max = eventLocalTime.withTime(22, 0, 0, 0);

I see the “22″ value but this code has been working flawlessly for weeks, what is suddenly causing it to fail? I fired the debugger, reproduced the problem and inspected the eventLocalTime variable. Something felt a bit unusual about its value:

2013-03-30T22:32:00.000 (America/Godthab)

I won’t claim that I know all the different time zones in existence in North America, but “America/Gothab” is certainly new to me. A quick Google search reveals that this time zone is in Greenland. Alright, so we have users in Greenland, but why would this cause a problem?

Reading this page again, something strikes me:

Next Scheduled DST Transition
DST will begin on Sat 30-Mar-2013 at 10:00:00 P.M. when local clocks are set forward 1 hour

and then everything falls into place.

The variable eventLocalTime is perfectly legitimate but today is March 29th, and adding 22 hours to it puts it into a nonexistent time, which happens whenever a time zone transitions to or from Daylight Savings Time. By adding twenty-two hours to the variable, we were trying to create a date “March 30th, 10pm”, which doesn’t exist because the clock in this time zone skips to 11pm. In the US, this nonexistent time period is usually between 2am and 3am, but for some reason, Greenland does this switch at 10pm.

The fix ended up looking as follows (happy to hear if there is a better solution):

try {
    max = eventLocalTime.withTime(22, 0, 0, 0);
} catch (IllegalFieldValueException e) {
    max = eventLocalTime.withTime(23, 0, 0, 0);
}

I am so happy that Joda will finally become standard with Java 8, and I have to tip my hat to Stephen, again, for his dedication to the terribly hard problem of date management.

If you have any funny bug to share, please go ahead and do so in the comments!

Tags:

A small step for Jebediah, a giant step for Kerbinkind

This is Kerbal Space Program, a sandbox space simulator created by a small team of passionate space programmers. I never really thought that designing rocket ships an aligning orbits could be fun, but I already spent more than thirty hours trying to launch my Kerbals into space (and killing most of them in the process). This turned out to be much more entertaining than I thought.

The game takes place on an imaginary planet called Kerbin. Kerbin looks like the earth and the planetary system it’s located in it has a lot of similarities with our own. As of the current version (0.18, $23, Windows, Mac and Linux), the game allows you to design either rocket ships or airplanes (you can send these to planets as well). Then you launch them and try to follow the plan you decided on.

The screen shot above is Jebediah Kerman, proudly taking his first steps on Mün. It’s the very first successful landing I ever did and it took me about fifty attempts (and countless fiery explosions) to reach this goal. I won’t deny that it was a lot of fun to get there, though, and I learned tons about space travel and ballistics.

The process of mastering the game follows a pretty clear path and while it doesn’t look that there is a big learning distance between your first orbit and your first successful landing, you will soon realize that you need to perfect a lot of intermediary steps in-between. And along the way, you will learn about ship design, the pros and cons of liquid and solid fuel, the compromises of taking a lot of fuel with you, how to launch a ship on a circular orbit, perfecting that orbit, learning how to use and interpret the navball, learning the maneuvers, when and how to perform gravity turns, fuel efficiency, choosing the optimal transfer path between planets, gravity slingshots, controlling your speed to land without exploding, etc…

The game is still in the very early stages and it’s hard not to draw a parallel with the early Minecraft days, where even with so few features, the game has such a strong appeal on imagination that it’s already attracting a very strong and devout following. Take a look at the Kerbal Space Program’s subreddit to get a feel for what people are doing with the game even at such an early stage. I can’t wait to see what it will be like once it reaches 1.0. The developers have managed to take what appears to be a very dry and challenging topic and turn it into a fun and challenging game which remains very approachable. And while you can take a very hard science approach to it, you can also decide to enjoy what it has to offer by simply learning what periapsis and apoapsis are, stopping there and focusing on the flying after that.

As for myself, I still have a lot to do. First, I need to work on fuel efficiency, because even if the Mün landing above was successful, my spaceship is now completely out of fuel (don’t tell Jebediah, he looks so happy). After this, the next step will be to try to land on the more distant planets.

Ad astra!

Answer to the “School” challenge

I’m happy to see mostly correct answers to the School coding challenge.

To make things more interesting, let’s start by looking at a naïve (and wrong) solution:

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((name == null) ? 0 : name.hashCode());
    result = prime * result + ((nickname == null) ? 0 : nickname.hashCode());
    return result;
  }

  @Override
  public boolean equals(Object obj) {
    // ... details omitted
    return other.nickname.equals(nickname) || other.name.equals(name);
  }

Let’s create two School objects, make sure they are equal, add them to a set and then display this set:

    School school1 = new School("University of Pennsylvania", "upenn");
    School school2 = new School("University of Pennsylvania", "Penn State");
    System.out.println("Equals: " + school1.equals(school2));
    Set<School> schools = Sets.newHashSet();
    schools.add(school1);
    schools.add(school2);
    System.out.println("Set: " + schools);

The output:

Equals: true
Set: [com.beust.School@d8e612c6, com.beust.School@7086780a]

This is obviously wrong and caused by the fact that this implementation breaks several laws, among which: “if two objects are equal, their hashcode should be equal as well”. However, this is clearly not the case here since the two objects instantiated in the main method are equal but their hash code are different, since it’s calculated from both the name and nickname. The consequences of such an implementation are disastrous in more ways than one, starting with the fact that your objects cannot reliably be stored in nor retrieved from identity-based collections (e.g. sets and maps).

The only way out of this would be to return a variable hashcode, but this doesn’t make sense since we wouldn’t know what field to base that hash code on.

Most commenters came up with the idea of always returning the same value for hashCode(), regardless of the name and nickname values. This approach is actually correct but it comes at the cost of losing all running time benefits offered by sets and maps, such as constant time retrievals. Instead, Hashmap#get and Set#contains are now linear operations. That’s a pretty bad prank to pull on your coworkers when their collections can contain millions of objects.

So, what’s the best way to solve this problem?

I’m not quite sure. There are several approaches, all with their pros and cons.

The first idea would be to not override equals and hashCode, using their default implementation defined on Object. This implementation (simple reference equality test) is guaranteed to be wrong in most cases, but since you can’t write a correct one, maybe the best approach is to do nothing.

Another approach would be to throw an OperationNotSupportedException in equals/hashCode. The implementation is still wrong but at least, your program will crash loudly if someone attempts to put these objects in an identity-based collection.

Ultimately, you really want to get rid of the fuzzy matching performed in the equals method, so the best approach is probably to perform a first pass on all your School objects and unify all those that are equal under the same id. After this, you can implement equals/hashCode based on this id field.

Coding challenge (“light” edition)

This coding challenge is a bit different from the previous one ([1] [2]) because it is shorter and also a bit more closely tied to Java.

A School has either a name (“Stanford”) or a nickname (“UCSD”) or both.

Your task is to write equals() and hashCode() methods for the School class that satisfy the following requirement: two Schools are identical if they either have the same name or the same nickname.