Archive for category Scala

Scala’s parallel collections

Scala 2.9 introduced parallel collections, which mirror most of the existing collections with a parallel version. Collections that have been parallelized this way have received a new method called par which magically parallelize certain operations on this collection.

For example, here is a sequential version:

scala> (1 to 5) foreach println
1
2
3
4
5

And the parallel version (notice the extra par keyword):

scala> (1 to 5).par foreach println
1
4
3
2
5

Obviously, the ordering will change each time you run the parallel version.

This piqued my curiosity and I decided to dig a bit further, starting with investigating what exactly is happening behind the scenes.

First of all, the parallel collections are based on an article called “A Generic Parallel Collection Framework”, by Martin Odersky, Aleksander Prokopec et al, which I highly recommend. It’s a very interesting analysis of how to decompose the concepts of parallelism, concurrency, collections, ranges and iterators and assemble them in a generic manner.

Sadly, this article ended up being the only highlight of my research in this area, because the more I dug into the Scala parallel collections, the more disappointed I became. By now, I am struggling to find a good use case for parallel collections, and I’m hoping that this article will generate some positive responses about their use.

Here are some of the problems that I found with the parallel collections, starting with the one I think is the most important one.

Lack of control

My first reaction when I saw the output above was to try to verify that indeed, threads were being spawned, and then find out how many of them, how I can control the size of the thread pool, etc…

I came up pretty much empty on all counts, and if I have missed a piece of the documentation that explains this, I would love to see it, but browsing the sources of ParSeq and other classes produced no useful result.

This is a big deal, and probably the worst problem with this framework. The loop above generated a parallel range of five entries, did it generate five threads? What happens if I try with 1000? 100000? The answer: it works for all these values, which makes me think that the loop is not allocating one thread per value. So it’s using a thread pool. But again: what size? Is that size configurable? How about other characteristics of that thread pool?

Digging deeper, what are the saturation and rejection policies? If the pool contains ten threads, what happens when it receives an eleventh value? It probably blocks, but can this be configured? Can the dispatch strategy be configured? Maybe I’m feeding operations of diverse durations and I want to make sure that the expensive operations don’t starve the faster ones, how can I do this?

This absence of configuration is a big blow to the parallel framework, and it relegates its usage to the simplest cases, where it will most likely not bring much speed gain compared to sequential execution.

Silver bullet illusion

Over the past months, I have seen quite a few users pop in the #scala channel and complain that parallel collections are not working. Taking a closer look at their code, it usually becomes quickly obvious that their algorithm is not parallelizable, and either 1) they didn’t realize it or 2) they were aware of that fact but they got the impression that par would magically take care of it.

Here is a quick example:

scala> Set(1,2,3,4,5) mkString(" ")
res149: String = 5 1 2 3 4

scala> Set(1,2,3,4,5).par mkString(" ")
res149: String = 5 1 2 3 4

You can run the par version over and over, the result will remain the same. This is confusing. Note that I used a Set this time, which indicates that I don’t care for the ordering of my collection. Calling mkString on the sequential version of my set reflects this. With this in mind, I would hope that calling mkString on the parallel version of my set would randomize its output, but that’s not what’s happening: I’m getting the same result as the sequential version, over and over.

It should be obvious that not all operations on collections can be parallelized (e.g. folds) but it looks like creating a string out of a set should be, and it’s not. I’m not going to go too far down here because the explanation is a mix of implementation details and theoretical considerations (the catamorphic nature of folds, set, the Scala inheritance hierarchy and the mkString specification), but the key point here is that the parallelization of collections can lead to non-intuitive results.

Bloat

I think the decision to retrofit the existing collections with the par operation was a mistake. Parallel operations come with a set of constraints that are not widely applicable to sequential collections, which leads to the situation that not all the collections support par (e.g. there is no ParTraversable) and more importantly, it imposes a burden on everyone, including people who don’t care for this functionality.

In doing this, Scala violates what I consider a fairly important rule for programming languages and API’s in general: you shouldn’t pay for what you don’t use. Not only do the parallel collections add a few megs to a jar file that’s already fairly big, but they probably introduce a great deal of complexity that is going to impact the maintainers of the collections (both sequential and parallel). It looks like anyone who will want to make modifications to the sequential collections will have to make sure their code is not breaking the parallel collections, and vice versa.

Unproven gains

Scala 2.9 is still very recent so it’s no surprise that we don’t really have any quantitative feedback of real world gains, but I’ll make a prediction today that the courageous developers who will decide to embrace the parallel collections wholeheartedly across their code base will see very little gains. In my experience, inner loops are hardly ever the bottleneck in large code bases, and I’d even go further in suspecting that spawning threads for elements of a loop could have adverse effects (context switching, memory thrashing, cache misses) for loops that iterate on very few elements or that are executing very fast operations already. I’m mostly speculating here, I haven’t run any measurements, so I could be completely wrong.

Remedies

Because of all these problems, I am a bit underwhelmed by the usefulness of the parallel collection framework overall, maybe someone who has a more extensive experience with it can chime in to share the benefits they reaped from it.

I have a couple of suggestions that I think might be a better path for this kind of initiative:

  • Split up the parallel and sequential collections, remove par and make sure that both hierarchies can be evolved independently of each other.
  • Provide a nice Scala wrapper around the Executor framework. Executors have everything that anyone interested in low level parallelism can dream of: configurable thread pool sizes, and even thread pools themselves, thread factories, saturation and rejection policies, lifecycle hooks, etc… You could write a Scala wrapper around this framework in a few hundreds of lines and it would be much more useful than what is currently possible with par.

Thoughts?

Update: Discussion on reddit.

Are Scala’s case classes a failed experiment?

A long time ago, Bertrand Meyer, the creator of Eiffel, made a very strong statement against the use of switch and case. He argued that these constructs were not object-oriented and made future maintenance difficult. And to make his point crystal clear, he decided that Eiffel would not support these statements.

This decision didn’t go well with Eiffel users and Meyer eventually gave in and added switch> and case to his language.

Meyer’s initial position was certainly extreme but I think the foundation of his argument still stands, and I was reminded of this age old debate just today while attending Martin Odersky’s Jazoon keynote, and more specifically, when he described Scala’s “case classes”.

Ever since I first read about case classes, I have been confused about their utility, and my puzzlement has not abated. Either I am missing something big or this feature is something that has been vastly overhyped and that should be avoided as much as possible

Let’s see why.

Here is the example used in the official Scala documentation:

def printTerm(term: Term) {
term match {
case Var(n) =>
print(n)
case Fun(x, b) =>
print("^" + x + ".")
printTerm(b)
case App(f, v) =>
Console.print("(")
printTerm(f)
print(" ")
printTerm(v)
print(")")
}
}

This code is part of a hypothetical parser: it does different things depending on whether we just encountered a variable, a function or an application.

I have two strong objections to this kind of approach:

  • It ties this class very tightly to all the classes described in the various cases.
  • It makes evolution problematic.

I think the first argument is pretty obvious: whatever class this code belongs to, that class knows an awful lot about all the classes it is trying to use. This wouldn’t be too much of a problem if the alternative didn’t turn out to be free of this problem, as I’ll show below.

The second claim is a bit more subtle but is actually the most important.

Obviously, this parser is quite incomplete: what happens if I want support for import clauses?

First, I am probably going to create at least one new class Import and second, I need to remember to add it to this code. Worse: all this code does is print out the type of Term it just parsed. Obviously, we will be needing more logic to actually do the work, such as a function to verify that the Term is syntactically correct (say verify(), and also a function to generate the code or the abstract information related to that term (generateCode()). Each of these methods will also feature a match structure, which will need to be updated for each new class.

In summary: every time that I add a new class, I need to remember to update three functions located somewhere in my code base.

Now we can see why Meyer dislikes switch statements so much…

The preferred way to solve these two problems is to make sure that each class encapsulates its own logic for all these operations:

interface Term {
void print();
void verify();
void generateCode();
}
public Var implements Term {
public void print() {
print(n)
}
public void verify() { ... }
public void generateCode() { ... }
}
// same for Fun and Expr

Here is how we print a term with case classes (from the Scala documentation, again):

Term t = ...;
printTerm(t)

And with proper encapsulation:

Term t = ...;
t.print();

With the OO approach, the code shown at the very top is simply no longer needed! If you need to print the content of a Term, you just invoke print() on it. If you add a new class to your parser, all you need to implement is conveniently summarized in the Term interface: no need to go hunting through your entire code base for switch statements. Not only do we have clean encapsulation, but we also have locality.

When I exposed the crux of this argument to a few Scala experts, they overall agreed about the point and responded by pointing out that case classes were more useful as an alternative to the Visitor pattern.

Before turning our attention to this specific case, it’s important to note that at this point, we are now looking at solving a niche problem. And quite a rare one, in my experience. If this is really the reason why case classes were invented, I am really left scratching my head about the decision to include such a big feature inside a language to solve such a small class of problems.

Anyway, let’s take a look at the Visitor angle.

In a nutshell, Visitors are used to emulate multiple dispatch in languages that don’t support it natively. In some way, you are extending virtual invocation to apply to the runtime type of parameters passed to your functions.

I’m not going into a deep discussion of the pros and cons of this approach, but I’ll just observe that even in the few cases where I have had such a need, an approach similar to the encapsulation explained above usually turns out to be preferable to a switch statement. In other words, if you need to have different operations depending on whether your function is being invoked with parameters of types A or B is better implemented by hiding this particular detail inside the class handling this dispatch, and not in a global switch.

From my understanding of Scala’s history, case classes were added in an attempt to support pattern matching, but after thinking about the consequences of the points I just gave, it’s hard for me to see case classes as anything but a failure. Not only do they fail to capture the powerful pattern matching mechanisms that Prolog and Haskell have made popular, but they are actually a step backward from an OO standpoint, something that I know Martin feels very strongly about and that is a full part of Scala’s mission statement.

To be fair, Scala navigates in very rough waters: finding a good compromise between running seamlessly on an existing OO platform and inventing a language that will not only accomodate fundamental functional programming paradigms but also leave the door open for advanced dynamic concepts such as open classes is no easy feat, and overall, Scala has done a very good job at finding the right balance. But unfortunately, not with case classes.

Or am I missing something?