Archive for category Kotlin

Genericity with functions

I can’t count the number of times I have manipulated trees and graphs in my career as a software engineer. They are the kind of structure that crops up over and over again in your code, pretty much regardless of the kind of application you are working on. And along these tasks, something that I’ve had to do over and over again is to display such a tree, usually for debugging purposes. Here is what a simple implementation can look like:

class Node(val value: String, val children: List<Node> = emptyList())

fun displayGraph(roots: List<Node>, indent: String = "") {
    roots.forEach {
        println(indent + it.value)
        displayGraph(it.children, indent + "  ")
    }
}

The Node contains a payload and a list of children. The displayGraph() function simply displays the value of each current node and then calls itself recursively with an increased indentation. Here is an example of the output:

val node = Node("Root", listOf(
    Node("A", listOf(
        Node("A1"),
        Node("A2"),
        Node("A3"))
    ),
    Node("B", listOf(
        Node("B1"),
        Node("B2", listOf(
                Node("B21"),
                Node("B22"))
        )))
    )
)

displayGraph(listOf(node))
Root
  A
    A1
    A2
    A3
  B
    B1
    B2
      B21
      B22

Down the rabbit hole

Now imagine that another part of your code is also manipulating graphs, and you’d like to display those as well. However, they are currently using their own data structure:

class Tree(val payload: Int, val leaves: List<Tree>)

Different payload type, different name, but same structure. Obviously, your code should be able to display this structure in a generic fashion, so you introduce an interface:

interface INode<T> {
    val children: List<INode<T>>
    val value: T
}

You rewrite your displayGraph() implementation with that interface:

fun <T> displayGraph(roots: List<INode<T>>, indent: String = "") {
    roots.forEach {
        println(indent + it.value)
        displayGraph(it.children, indent + "  ")
    }
}

And you make your two types conform to that interface:

class Node(override val value: String,
    override val children: List<Node> = emptyList()) : INode<String>

class Tree(val payload: Int, val leaves: List<Tree>) : INode<Int> {
    override val children: List<Tree> = leaves
    override val value: Int = payload
}

And you can now pass either Node or Tree to your displayGraph() function:

    val graph = listOf(Node("A", listOf(
            Node("A1"),
            Node("A2")))
    )
    displayGraph(graph)

    val tree = listOf(Tree(1, listOf(
            Tree(11),
            Tree(12)))
    )
    displayGraph(tree)

Output:

A
  A1
  A2
1
  11
  12

So we made a solid step toward making our code generic while avoiding duplication, but we’re not going to stop here.

Imagine now that this Tree class doesn’t belong to you (for example, it came from a library or some other portion of code that you can’t modify). Therefore, you can’t make it conform to your INode interface. How can you still reuse your code despite this limitation?

This brings us to the main take away of this post:

If you can’t adapt your data structures to your algorithms, adapt your algorithms to your data structures.

Let’s ask ourselves the following question: what would it take for displayGraph() to work if it can’t make any assumption about the types it’s working on? Because these types can come from areas we don’t control, we can’t assume anything about them. In other words, they are some T type, which we know nothing about.

Well, we can ask the caller to pass us functions to manipulate this type. Looking at the implementation of displayGraph(), all we need is two functions that will give us:

  • The children of a node.
  • The value contained in a node.
fun <T, U> displayGraphGeneric(roots: List<T>,
        children: (T) -> List<T>,
        value: (T) -> U,
        indent: String = "") {
    roots.forEach {
        println(indent + value(it))
        displayGraphGeneric(children(it), children, value, indent + "  ")
    }
}

As promised, you can see that the code above is manipulating an unknown type T, and in order to do that, we use the functions passed in parameters. Notice also the arrival of a second type parameter U, which is used to define the type of the payload. As you might have noticed, Node has a payload of type String while Tree‘s is Int. Therefore, we need a second type parameter to capture this flexibility.

How do we call this new generic function for our two types?

displayGraphGeneric(graph, Node::children, Node::value)
displayGraphGeneric(tree, Tree::leaves, Tree::payload)

What if you sometimes want to display the graph on standard out and other times display it with the logger? You guessed it: you can make this generic by passing an additional closure.

Stepping back

So what did we do exactly through this exercise?

Functions need a contract in order to do their job. With the first approach, the contract is encoded in the type, which places a constraint on the kind of value you can pass to the function. In our second approach, the contract has been extracted from the type and passed on the side, as function parameters. This makes the syntax a bit more verbose but you gain a very important piece of functionality: the ability to apply functions to types even though they don’t conform to a certain interface.

Does this characterization ring a bell? The concept you might be thinking of is “type class”.

In effect, we have just implemented our own version of a type class mechanism. If you are not familiar with type classes, you can think of them as a “family of types” in much the same way that a type is a “family of values”. It’s just one step above on the abstraction ladder. Type classes look like interfaces except they can be defined after the types they apply to have been defined, just like we did with our Tree and Graph classes.

Our manual implementation shown above is actually a lot closer from a real type class implementation than you think. Compilers typically implement type classes by passing along invisible parameters commonly referred to as “dictionaries” that contain the information necessary for the compiler to generate the code that will connect the value to whatever it takes to make it conform to the expected interface. The only difference with our implementation above is that we passed these functions explicitly.

Notice also how this approach allows you to “adapt” types to obey a certain contract, and from that standpoint, it could qualify as the “Adapter” pattern, except that this time, it’s implemented in a functional way instead of an object oriented way. The emphasis has moved away from the object and the class (which is unknown) and toward functions (which are clearly specified). Next time someone tells you design patterns don’t exist in functional programming because they are not necessary, point them to this post.

Next time you are confronted with a problem where you’d like to reuse an existing generic algorithm, remember that you can implement your own type class mechanism as shown in this article.

The Kobalt diaries: Parallel builds

I’ve always wondered why the traditional build systems we use on the JVM (Maven, Gradle) don’t support parallel builds (edit: this is not quite true, see the comments), so now that I developed my own build system, I decided to find out if there was any particular difficulty to this problem that I had overlooked. This turned out to be easier than I anticipated and with interesting benefits. Here is a preliminary result of my findings.

Defining parallel builds

While most builds are sequential by nature (task B can’t usually run until task A has completed), projects that contain multiple sub projects can greatly benefit from the performance boost of parallel builds. This speed up will obviously not apply to single projects or subprojects that have direct dependencies on each other (although this is not quite true as we will see below). If you’re curious, the engine that powers Kobalt’s parallel builds is an improved version of TestNG’s own DynamicGraphExecutor, which I described in details in this post.

In order to measure the efficacy of parallel builds, I needed a more substantial project, so I picked ktor, Kotlin’s web framework developed by JetBrains. ktor is interesting because it contains more than twenty sub projects with various dependencies with each other. Here is a dependency graph:

You can see that core is the main project that everybody else depends on. After that, the dependencies open up and we have the potential to build some of these projects in parallel: locations, samples, etc…

I started by converting the ktor build from Maven to Kobalt. Right now, ktor is made of about twenty-five pom.xml files that add up to a thousand lines of build files. Kobalt’s build file (Build.kt) is one file of about two hundred lines, and you can find it here. The fact that this build file is a pure Kotlin file allows to completely eliminate the redundancies and maximize the reuse of boiler plate code that most sub projects define.

Extracting the dependencies from Build.kt is trivial thanks to Kobalt’s convenient syntax to define dependencies:

$ grep project kobalt/src/Build.kt
val core = project {
val features = project(core) {
val tomcat = project(core, servlet) {

We see the core project at the top of the dependency graph. Then features depends on core, tomcat depends on both core and servlet and so on…

So without further ado, let’s launch the build in parallel mode and let’s see what happens:

$ ./kobaltw --parallel assemble

At the end of a parallel build, Kobalt optionally displays a summary of the way it scheduled the various tasks. Here is what we get after building the project from scratch:

╔══════════════════════════════════════════════════════════════════════════════════════════════════════
║  Time (sec) ║ Thread 39           ║ Thread 40           ║ Thread 41           ║ Thread 42           ║
╠══════════════════════════════════════════════════════════════════════════════════════════════════════
║  0          ║ core                ║                     ║                     ║                     ║
║  45         ║ core (45)           ║                     ║                     ║                     ║
║  45         ║                     ║ ktor-locations      ║                     ║                     ║
║  45         ║                     ║                     ║ ktor-netty          ║                     ║
║  45         ║                     ║                     ║                     ║ ktor-samples        ║
║  45         ║ ktor-hosts          ║                     ║                     ║                     ║
║  45         ║                     ║                     ║                     ║                     ║
║  45         ║ ktor-hosts (0)      ║                     ║                     ║                     ║
║  45         ║ ktor-servlet        ║                     ║                     ║                     ║
║  45         ║                     ║                     ║                     ║                     ║
║  45         ║                     ║                     ║                     ║                     ║
║  45         ║                     ║                     ║                     ║ ktor-samples (0)    ║
║  45         ║                     ║                     ║                     ║ ktor-freemarker     ║
║  49         ║                     ║                     ║                     ║ ktor-freemarker (3) ║
...
╚═════════════════════════════════════════════════════════════════════════════════════════════════════╝

PARALLEL BUILD SUCCESSFUL (68 seconds, sequential build would have taken 97 seconds)

The “Time” column on the left describes at what time (in seconds) each task was scheduled. Each project appears twice: when they start and when they finish (and when they do, the time they took is appended in parentheses).

Analyzing the thread scheduling

As you can see, core is scheduled first and since all the other projects depend on it, Kobalt can’t build anything else until that project completes, so the other four threads remain idle during that time. When that build completes forty-five seconds later, Kobalt now determines that quite a few projects are now eligible to build, so they get scheduled on all the idle threads: ktor-locations, ktor-netty, etc… The first to complete is ktor-hosts and Kobalt immediately schedules the next one on that thread.

Finally, Kobalt gives you the complete time of the build and also how long a sequential build would have taken, calculated by simply adding all the individual project times. It’s an approximation (maybe these projects would have been built faster if they weren’t competing with other build threads) but in my experience, it’s very close to what you actually get if you perform the same build sequentially.

Obviously, the gain with parallel build is highly dependent on the size of your projects. For example, if project C depends on projects A and B and these two projects are very small, the gain in parallelizing that build will be marginal. However, if A and B are both large projects, you could see your total build time divided by two. Another big factor in the gain you can expect is whether you use an SSD. Since all these parallel builds are generating a lot of files in various directories concurrently, I suspect that a physical hard drive will be much slower than an SSD (I haven’t tested, I only have SSD’s around).

Taking things further

When project B depends on project A, it certainly looks like you can’t start building B until A has completed, right? Actually, that’s not quite true. It’s possible to parallelize (or more accurately, vectorize) such builds too. For example, suppose you launch ./kobaltw test on these two projects:

$ ./kobalw test
----- A:compile
----- A:assemble
----- A:test
----- B:compile
----- B:assemble
----- B:test

But the dependency of B on A is not on the full build: B certainly doesn’t need to wait for A to have run its tests before it can start building. In this case, B is ready to build as soon as A is done assembling (i.e. created A.jar). So here, we could envision having threads scheduled at the task level, and not at the project level. So what we could really do is:

╔═════════════════════════════════════════════════════════╗
║  Time (sec) ║ Thread 39           ║ Thread 40           ║
╠═════════════════════════════════════════════════════════╣
║             ║ A:compile           ║                     ║
║             ║ A:assemble          ║                     ║
║             ║ A:test              ║ B:compile           ║
║             ║                     ║ B:assemble          ║
║             ║                     ║ B:test              ║

As you can see above, Kobalt schedules B:compile as soon as A:assemble has completed while starting A:test on a separate thread, resulting in a clear reduction in build time.

This task-based approach can improve build times significantly since tests (especially functional tests) can take minutes to run.

Wrapping up

I started implementing parallel builds mostly as a curiosity and with very low expectations but I ended up being very surprised to see how well they work and how they improve my build times, even when just considering project-based concurrency and not task-based concurrency. I’d be curious to hear back from Kobalt users on how well this new feature performs on their own projects.

And if you haven’t tried Kobalt yet, it’s extremely easy to get started.

Neural networks in Kotlin (part 2)

In the previous installment, I introduced a mystery class NeuralNetwork which is capable of calculating different results depending on the data that you train it with. In this article, we’ll take a closer look at this neural network and crunch a few numbers.

Neural networks overview

A neural network is a graph of neurons made of successive layers. The graph is typically split in three parts: the leftmost column of neurons is called the “input layer”, the rightmost columns of neurons is the “output layer” and all the neurons in-between are the “hidden” layer. This hidden layer is the most important part of your graph since it’s responsible for making the calculations. There can be any numbers of hidden layers and any number of neurons in each of them (note that the Kotlin class I wrote for this series of articles only uses one hidden layer).

Each edge that connects two neurons has a weight which is used to calculate the output of this neuron. The calculation is a simple addition of each input value multiplied by its weight. Let’s take a look at a quick example:


This network has two input values, one hidden layer of size two and one output. Our first calculation is therefore:

w11-output = 2 * 0.1 + (-3) * (-0.2) = 0.8
w12-output = 2 * (-0.4) + (-3) * 0.3 = -1.7

We’re not quite done: the actual outputs of neurons (also called “activations”) are typically passed to a normalization function first. To get a general intuition for this function, you can think of it as a way to constrain the outputs within the [-1, 1] range, which prevents the values flowing through the network from overflowing or underflowing. Also, it’s useful in practice for this function to have additional properties connected to its derivative but I’ll skip over this part for now. This function is called the “activation function” and the implementation I used in the NeuralNetwork class is the hyperbolic tangent, tanh.

In order to remain general, I’ll just refer to the activation function as f(). We therefore refine our first calculations as follows:

w11-output = f(2 * 0.1 + (-3) * (-0.2))
w12-output = f(2 * (-0.4) + (-3) * 0.3)

There are a few additional details to this calculation in actual neural networks but I’ll skip those for now.

Now that we have all our activations for the hidden layer, we are ready to move to the next layer, which happens to be the ouput layer, so we’re almost done:

output = f(0.1 * w11-output - 0.2 * w12-output
       = 0.42

As you can see, calculating the output of a neural network is fairly straightforward and fast, much faster than actually training that network. Once you have created your networks and you are satisfied with its results, you can just pass around the characteristics of that network (weights, sizes, …) and any device (even phones) can then use that network.

Revisiting the xor network

Let’s go back to the xor network we created in the first episode. I created this network as follows:

NeuralNetwork(inputSize = 2, hiddenSize = 2, outputSize = 1)

We only need two inputs (the two bits) and one output (the result of a xor b). These two values are fixed. What is not fixed is the size of the hidden layer, and I decided to pick 2 here, for no particular reason. It’s interesting to tweak these values and see whether your neural network performs better of worse based on these values and there is actually a great deal of both intuition and arbitrary choices that go into these decisions. These values that you use to configure your network before you run it are called “hyper parameters”, in contrast to the other values which get updated while your network runs (e.g. the weights).

Let’s now take a look at the weights that our xor network came up with, which you can display by running the Kotlin application with --log 2:

Input weights:
-1.21 -3.36 
-1.20 -3.34 
1.12 1.67 

Output weights:
3.31 
-2.85 

Let’s put these values on the visual representation of our graph to get a better idea:


You will notice that the network above contains a neuron called “bias” that I haven’t introduced yet. And I’m not going to just yet besides saying that this bias helps the network avoid edge cases and learn more rapidly. For now, just accept it as an additional neuron whose output is not influenced by the previous layers.

Let’s run the graph manually on the input (1,0), which should produce 1:

hidden1-1 = 1 * -1.21
hidden1-2 = 0 * -1.20
bias1     = 1 * 1.12

output1 = tanh(-1.21 + 1.12) = -0.09

hidden2-1 = 1 * -3.36
hidden2-2 = 0 * -3.34
bias2     = 1 * 1.67

output2 = tanh(-3.36 + 1.6) = -0.94

// Now that we have the outputs of the hidden layer, we can caculate
// our final result by combining them with the output weights:

finalOutput = tanh(output1 * 3.31 + output2 * (-2.85))
            = 0.98

We have just verified that if we input (0,1) into the network, we’ll receive 0.98 in output. Feel free to calculate the other three inputs yourself or maybe just run the NeuralNetwork class with a log level of 2, which will show you all these calculations.

Revisiting the parity network

So the calculations hold up but it’s still a bit hard to understand where these weights come from and why they interact in the way they do. Elucidating this will be the topic of the next installment but before I wrap up this article, I’d like to take a quick look at the parity network because its content might look a bit more intuitive to the human eye, while the xor network detailed above still seems mysterious.

If you train the parity network and you ask the NeuralNetwotk class to dump its output, here are the weights that you’ll get:

Input weights:
0.36 -0.36 
0.10 -0.09 
0.30 -0.30 
-2.23 -1.04 
0.57 -0.58 

Output weights:
-1.65 
-1.64 

If you pay attention, you will notice an interesting detail about these numbers: the weights of the first three neurons of our hidden layer cancel each other out while the two inputs of the fourth neuron reinforce each other. It’s pretty clear that the network has learned that when you are testing the parity of a number in binary format, the only bit that really matters is the least significant one (the last one). All the others can simply be ignored, so the network has learned from the training data that the best way to get close to the desired output is to only pay attention to that last bit and cancel all the other ones out so they don’t take part in the final result.

Wrapping up

This last result is quite remarkable if you think about it, because it really looks like the network learned how to test parity at the formal level (“The output of the network should be the value of the least significant bit, ignore all the others”), inferring that result just from the numeric data we trained it with. Understanding how the network learned how to modify itself to reach this level will be the topic of the next installment of the series.

The Kobalt diaries: Automatic Android SDK management


The dreaded SDK Manager

Android has always had a weird dependency mechanism. On the JVM (and therefore, Android), we have this great Maven repository system which is leveraged by several tools (Gradle and Kobalt on Android) and which allows us to add dependencies with simple additions to our build files. This is extremely powerful and it has undoubtedly been instrumental in increasing the JVM’s popularity. This system works for pretty much any type of applications and dependencies.

Except for Android libraries.

The Android support libraries (and I’m using “support” loosely here to include all such libraries and not just the one that Google calls “support”) are not available on any of the Maven repositories. They are not even available on Google’s own repository. Instead, you need to use a special tool to download them, and once you do that, they land on your local drive as a local Maven repository, which you then need to declare to Gradle so you can finally add the dependencies you need.

I suspect the reason why these libraries are not available in a straight Maven repo is that you need to accept licenses before you can use them, but regardless, this separate download management makes building Android applications more painful, especially for build servers (Travis, Jenkins) which need to be configured specifically for these builds.

The graphical tool used to download this repository, called “SDK Manager”, is also a command tool called "android" that can be invoked without the GUI, and inspired by Jake Wharton’s sdk-manager-plugin, I set out to implement automatic SDK management for Kobalt.

Once I became more comfortable with the esoteric command line syntax used by the "android" tool, adding it to the Kobalt Android plug-in was trivial, and as a result, Kobalt is now able to completely install a full Android SDK from scratch.

In other words, all you need in your project is a simple Android configuration in your Build.kt file:

    android {
        compileSdkVersion = "23"
        buildToolsVersion = "23.0.1"
        // ...
    }

    dependencies {
        compile("com.android.support:appcompat-v7:aar:23.0.1")
    }

The Kobalt Android plug-in will then automatically download everything you need to create an apk from this simple build file:

  • If $ANDROID_HOME is specified, use it and make sure a valid SDK is there. If that environment variable is not specified, install the SDK in a default location (~/.android-sdk).
  • If no Build Tools are installed, install them.
  • Then go through all the Google and Android dependencies for that project and install them as needed.
  • And a few other things…

A typical run on a clean machine with nothing installed will look like this:

$ ./kobaltw assemble
...
Android SDK not found at /home/travis/.android/android-sdk-linux, downloading it
Couldn't find /home/travis/.android/android-sdk-linux/build-tools/23.0.1, downloading it
Couldn't find /home/travis/.android/android-sdk-linux/platform-tools, downloading it
Couldn't find /home/travis/.android/android-sdk-linux/platforms/android-23, downloading it
Couldn't find Maven repository for extra-android-m2repository, downloading it
Couldn't find Maven repository for extra-google-m2repository, downloading it
...
          ===========================
          | Building androidFlavors |
          ===========================
------ androidFlavors:clean
------ androidFlavors:generateR
------ androidFlavors:compile
  Java compiling 4 files
------ androidFlavors:proguard
------ androidFlavors:generateDex
------ androidFlavors:signApk
Created androidFlavors/kobaltBuild/outputs/apk/androidFlavors.apk

Obviously, these downloads will not happen again unless you modify the dependencies in your build file.

I’m hopeful that Google will eventually make these support libraries available on a real remote Maven repository so we don’t have to jump through these hoops any more, but until then, Kobalt has you covered.

This feature is available in the latest kobalt-android plug-in as follows:

val p = plugins("com.beust:kobalt-android:0.81")

The Kobalt diaries: testing

Kobalt automatically detects how to run your tests based on the test dependencies that you declared:

dependenciesTest {
    compile("org.testng:testng:6.9.9")
}

By default, Kobalt supports TestNG, JUnit and Spek. You can also configure how your tests run
with the test{} directive:

test {
    args("-excludegroups", "broken", "src/test/resources/testng.xml")
}

The full list of configuration parameters can be found in the TestConfig class.

Additionally, you can define multiple test configurations, each with a different name. Each
configuration will create an additional task named "test" followed by the name of
that configuration. For example:

test {
    args("-excludegroups", "broken", "src/test/resources/testng.xml")
}

test {
    name = "All"
    args("src/test/resources/testng.xml")
}

The first configuration has no name, so it will be launched with the task "test",
while the second one can be run with the task "testAll".

The full series of articles on Kobalt can be found here.

The Kobalt diaries: templates

The latest addition to Kobalt is templates, also known as “archetypes” in Maven.

Templates are actions performed by plug-ins that create a set of files in your project. They are typically used when uou are beginning a brand new project and you want some default files to be created. Of course, nothing stops you from invoking templates even if you already have an existing project since templates can generate pretty much any kind of files. Here is how they work in Kobalt.

You can get a list of available templates with the --listTemplates parameter:

$ kobaltw --listTemplates
Available templates
  Plug-in: Kobalt
    "java"              Generate a simple Java project
    "kotlin"            Generate a simple Kotlin project
    "kobalt-plugin"     Generate a sample Kobalt plug-in project

You invoke a template with the --init parameter. Let’s call the "kobalt-plugin" template:

$ ./kobaltw --init kobalt-plugin
Template "kobalt-plugin" installed
Build this project with `./kobaltw assemble`

$ ./kobaltw assemble
          ╔════════════════════════════╗
          ║ Building kobalt-line-count ║
          ╚════════════════════════════╝
───── kobalt-line-count:compile
───── kobalt-line-count:assemble
  Created .\kobaltBuild\libs\kobalt-line-count-0.18.jar
  Created .\kobaltBuild\libs\kobalt-line-count-0.18-sources.jar
  Wrote .\kobaltBuild\libs\kobalt-line-count-0.18.pom
BUILD SUCCESSFUL (3 seconds)

The template was correctly installed, then it provided instructions on what to do next, which we followed, and now we have a fully working project. This one is particular since it’s a Kobalt plug-in and I’ll get back to it shortly. But before that, let’s drill a bit deeper into templates.

Templates would be pretty useless if they were limited to the default Kobalt distribution, so of course, you can invoke templates on plug-ins. Even plug-ins that you haven’t downloaded yet! Kobalt can download plug-ins from any Maven repository and run them.

To illustrate this, let’s see what templates the Kobalt Android plug-in offers:

$ kobaltw --listTemplates --plugins com.beust:kobalt-android:
  Downloaded https://jcenter.bintray.com/com/beust/kobalt-android/0.40/kobalt-android-0.40.pom
  Downloaded https://jcenter.bintray.com/com/beust/kobalt-android/0.40/kobalt-android-0.40.jar
Available templates
  Plug-in: Kobalt
    "java"              Generate a simple Java project
    "kotlin"            Generate a simple Kotlin project
    "kobaltPlugin"      Generate a sample Kobalt plug-in project
  Plug-in: Android
    "androidJava"       Generate a simple Android Java project
    "androidKotlin"     Generate a simple Kotlin Java project

Several things happened here. First of all, we are invoking the same --listTemplates command as earlier but this time, there is a new --plugins parameter. You pass this parameter a list of Maven id’s representing the Kobalt plug-ins you want Kobalt to install. This is similar to declaring these plug-ins in your build file, except that typically, when you run a template, you don’t have a build file yet. So this is an easy way to install plug-ins without requiring a build file.

Finally, notice that the Maven id used above, "com.beust:kobalt-android:" doesn’t have a version number and instead, ends with a colon. This is how you ask Kobalt to locate the latest version of the plug-in for you.

Kobalt responded by determining that the latest version of the Kobalt Android plug-in is 0.40, downloading it, installing it and then asking it what templates it provides. The Kobalt Android plug-in provides two templates, both of them creating a full-blown Android application, one written in Kotlin and one in Java. Let’s install the Kotlin one:

$ kobaltw --plugins com.beust:kobalt-android: --init androidKotlin
Template "androidKotlin" installed
Build this project with `./kobaltw assemble`

$ find .
./kobalt/src/Build.kt
./src/main/AndroidManifest.xml
./src/main/kotlin/com/example/MainActivity.kt
./src/main/res/drawable-hdpi/ic_launcher.png
./src/main/res/drawable-ldpi/ic_launcher.png
./src/main/res/drawable-mdpi/ic_launcher.png
./src/main/res/drawable-xhdpi/ic_launcher.png
./src/main/res/values/strings.xml
./src/main/res/values/styles.xml

$ ./kobaltw assemble
          ╔══════════════════════╗
          ║ Building kobalt-demo ║
          ╚══════════════════════╝
───── kobalt-demo:generateR
───── kobalt-demo:compile
───── kobalt-demo:proguard
───── kobalt-demo:generateDex
───── kobalt-demo:signApk
Created kobaltBuild\outputs\apk\kobalt-demo.apk
───── kobalt-demo:assemble
BUILD SUCCESSFUL (9 seconds)

We now have a complete Android application written in Kotlin.

Let’s go back to the template we built at the beginning of this article: the Kobalt plug-in called "kobalt-line-count-0.18.jar". It’s a valid Kobalt plug-in so how do we test it? We could upload it to JCenter and then invoke it with the --plugins parameter, but Kobalt provides another handy command line parameter to test such plug-ins locally: --pluginJarFiles. This parameter is similar to --plugins in that it installs a plug-in, except that it does so from a local jar file and not a remote Maven id.

Let’s install this plug-in and see which tasks are then available to us:

$ ./kobaltw --pluginJarFiles kobaltBuild/libs/kobalt-line-count-0.18.jar --tasks

List of tasks

...

  ═════ kobalt-line-count ═════
    dynamicTask         Dynamic task
    lineCount           Count the lines

...

As you can see, Kobalt has installed the kobalt-line-count plug-in, which then added its own tasks to Kobalt’s default ones. The Kobalt plug-in template appears to work fine. From this point on, you can edit it and create your own Kobalt plug-in.

Speaking of plug-in development, how hard is it to add templates to your Kobalt plug-in? Not hard at all! All you need to do is to implement the ITemplateContributor interface:

interface ITemplateContributor {
    val templates: List<ITemplate>
}

Each template provides a name, a description and a function that actually generates the files for the current project. Feel free to browse how Kobalt’s Android plug-in implements its templates.

The full series of articles on Kobalt can be found here.

The Kobalt Diaries: Incremental Tasks

One of the recent additions to Kobalt is incremental tasks. This is the ability for each build task to be able to check whether it should run or not based on whether something has changed compared to the previous run. Here are a few quick outlines of how this feature works in Kobalt.

Overview

Kobalt’s incremental task architecture is based on checksums. You implement an incremental task by giving Kobalt a way to compute an input checksum and an output checksum. When the time comes to run your task, Kobalt will ask for your input checksum and it will compare it to that of the previous run. If they are different, your task is invoked. If they are identical, Kobalt then compares the two output checksums. Again, if they are different, your task is run, otherwise it’s skipped. Finally, Kobalt updates the output checksum on successul completion of your task.

This mechanism is extremely general and straightforward to implement for plug-in developers, who remain in full control of how exhaustive their checksum should be. You could decide to stick to the default MD5 checksums of the files and directories that are of interest to your task, or if you want to be faster, only check the timestamps of your file and return a checksum reflecting whether Kobalt should run you or not. And of course, checksums don’t even have to map to files at all: if your task needs to perform a costly download, it could first check a few HTTP headers and again, return a checksum indicating whether your task should run.

Having said that, build systems tend to run tasks that have files for inputs and outputs, so it seems logical to think about an incremental resolution that would be based not on checksums (which can be expensive to compute) but on file analyses. While a checksum can tell you “One of these N files has been modified”, it can’t tell you exactly which one, and such information can open the door to further incremental work (see below for more details).

One approach for file-based tasks could be for the build system to store the list of files along with some other data (timestamp or checksum) and then pass the relevant information to the task itself. The complication here is that file change resolution implies knowing the following three pieces of information:

  • Which files were modified.
  • Which files were added.
  • Which files were removed.

The downside is obviously that there is more bookkeeping required to preserve this information around between builds but the clear benefit is that if a task ends up being invoked, it can perform its own incremental work on just the files that need to be processed, whereas the checksum approach forces the task to perform its work on the entire set of inputs.

Implementation

Incremental tasks are not very different from regular tasks. An incremental task returns an IncrementalTaskInfo instance which is defined as follows:

class IncrementalTaskInfo(
	val inputChecksum: String?,
    val outputChecksum: () -> String?,
    val task: (Project) -> TaskResult)

The last parameter is the task itself and the first two are the input and output checksums of your task. Your task simply uses the @IncrementalTask annotation instead of the regular @Task and it needs to return an instance of that class:

@IncrementalTask(name = "compile", description = "Compile the source files")
fun taskCompile(project: Project) = IncrementalTaskInfo(/* ... */)

Most of Kobalt’s own tasks are now incremental (wherever that makes sense) including the Android plug-in. Here are a few timings showing incremental builds in action:

Kobalt

Task First run Second run
kobalt-wrapper:compile 627 ms 22 ms
kobalt-wrapper:assemble 9 ms 9 ms
kobalt-plugin-api:compile 10983 ms 54 ms
kobalt-plugin-api:assemble 1763 ms 154 ms
kobalt:compile 11758 ms 11 ms
kobalt:assemble 42333 ms 2130 ms
70 seconds 2 seconds

Android (u2020)

Task First run Second run
u2020:generateRInternalDebug 32350 ms 1652 ms
u2020:compileInternalDebug 3629 ms 24 ms
u2020:retrolambdaInternalDebug 668 ms 473 ms
u2020:generateDexInternalDebug 6130 ms 55 ms
u2020:signApkInternalDebug 449 ms 404 ms
u2020:assembleInternalDebug 0 ms 0 ms
43 seconds 2 seconds

Wrapping up

At the moment, Kobalt only supports checksum-based incremental tasks since that approach subsumes all the other approaches but I’m not ruling out adding input-specific incremental tasks in the future if there’s interest. In the meantime, checksums are working very well and pretty efficiently, even on large directories and/or large files.

If you are curious to try it yourself, please download Kobalt and report back!

The full series of articles on Kobalt can be found here.

A close look at Kotlin’s “let”

let is a pretty useful function from the Kotlin standard library defined as follows:

fun <T, R> T.let(f: (T) -> R): R = f(this)

You can refer to a previous article I wrote if you want to understand how this function works, but in this post, I’d like to take a look at the pros and cons of using let.

let is basically a scoping function that lets you declare a variable for a given scope:

File("a.txt").let {
    // the file is now in the variable "it"
}

There is another subtle use of let when applied to a nullable reference. The ?. operator
lets you make sure that the code in scope is only run if the expression is not null:

findUser(id)?.let {
    // only run if findUser() returned a non null value
}

After going back and forth about whether this idiom is superior to a simple null test, I am slowly leaning to abandoning it in favor of an if for the following reasons:

  • This idiom is only useful if you want to do an if that doesn’t have an else branch. I tend to view such constructs as suspicious since if without an else can be a source of bugs.

  • This idiom introduces a renaming. Either you use the default lambda syntax, in which case the renamed variable is implicitly called it, or you explicitly name the argument:

    val user = findUser(id)
    user?.let { foundUser ->
        // ...
    }
    

    This can occasionally be useful but sometimes, I just don’t feel like being forced to rename my variable.


  • Following the previous point, if doesn’t impose a renaming but Kotlin’s smart casting guarantees that you won’t have any surprise:

    val user = findUser(id)
    if (user != null) {
        // user is now a non null reference
    }
    

    Also, the fact that no new name was introduced and the variable keeps its name user the entire time improves readability in my opinion.

So for these reasons, I tend to default to a good old if these days. None of these arguments are deal breakers, it’s mostly a stylistic preference at this point. Let’s see if I’ll change my mind over the next few months.

The Kobalt diaries: profiles

When I started thinking about how profiles should work in Kobalt, I realized that the simplest approach I’d like to see in a build tool is defining a boolean variable and having if statements in my build file. So that’s exactly how Kobalt’s profiles are implemented.

You start by defining boolean values initialized to false in your build file:

  val experimental = false
  val premium = false

Then you use this variable wherever you need it in your build file:

  val p = javaProject {
      name = if (experimental) "project-exp" else "project"
      version = "1.3"
      ...

Finally, you invoke ./kobaltw with the --profiles parameter followed by the profiles you want to activate, separated by a comma:

  ./kobaltw -profiles experimental,premium assemble

Keep in mind that since your build file is a real Kotlin source file,
you can use these profile variables pretty much anywhere, e.g.:

dependencies {
    if (experimental)
        "com.squareup.okhttp:okhttp:2.4.0"
    else
        "com.squareup.okhttp:okhttp:2.5.0",

And that’s it.

The full series of articles on Kobalt can be found here.

The Kobalt diaries: it’s the little things

When I embarked on the ridiculously ambitious goal of writing a build tool, I had plans to tackle both big problems and small problems. My previous (and probably future) blog post cover the big problems such as performance, plug-in architecture and DSL syntax, but in this post, I’m going to cover a few little things that I was quite happy to finally be able to get from my build tool.

I’ve always found it a hassle to keep up with the latest versions of the dependencies of my build, especially since it’s its job to tell me. Therefore, Kobalt has a handy --checkVersions parameter that will check to see if it can find any new version of your dependencies:

$ ./kobaltw --checkVersions
New versions found:
        org.jetbrains.kotlin:kotlin-compiler-embeddable:1.0.0-beta-2423
        org.jetbrains.kotlin:kotlin-stdlib:1.0.0-beta-2423
        com.squareup.okhttp:okhttp:2.6.0
        com.squareup.retrofit:retrofit:2.0.0-beta2

Another convenient switch is --resolve, which looks up a dependency and gives you some information about it, such as which Maven repo it is found in and its own dependency tree. You can also use an id without a version (e.g. org.testng:testng:) to ask Kobalt to find the most recent version of that artifact:

$ ./kobaltw --resolve org.testng:testng:
╔══════════════════════════════════════════════════════════════════════════════════╗
║                                     org.testng:testng:                           ║
║           http://repo1.maven.org/maven2/org/testng/testng/6.9.9/testng-6.9.9.jar ║
╚══════════════════════════════════════════════════════════════════════════════════╝
╟ junit:junit:4.10
║      ╙ org.hamcrest:hamcrest-core:1.1
╟ com.beust:jcommander:1.48
╟ org.apache.ant:ant:1.7.0
║      ╙ org.apache.ant:ant-launcher:1.7.0
╟ org.yaml:snakeyaml:1.15
╙ org.beanshell:bsh:2.0b4

Finally, I’ve always been bugged by what I consider a glaring omission of the Gradle Android plug-in: not being able to run my applications. The plug-in generates tasks for the various variants of your application (assembleDevDebug, assembleDevRelease, installDevDebug, etc…) but strikingly, no "run" task. I’m happy to report that Kobalt’s Android plug-in supports exactly that. To see it in action, clone the Kobalt example and follow the instructions at the bottom of the README:

$ ./kobaltw runFreeDebug // build, install and launch that variant
$ ./kobaltw runFreeRelease // build, install and launch that variant

I’ve made a lot of improvements to the Android plug-in lately, but that will be the topic for another post.

The full series of articles on Kobalt can be found here.