cons

Clojure 101

While I have presented a number of topics in the past, this was the first time that I found myself being very sceptical on the way I should present things. I've tried asking the organizers to get a sense about the audience, but the organizers were unsure. I thought that I had to be somewhat generic on my approach and that made things a bit harder. Senior people are intrigued by other things compared to junior people and people coming from the Java world have different interests compared to people that are coming from the Javascript world. As I would mainly focus on Clojure and I was planning to show host interop, that could potentially be of not much interest to people from non JVM backgrounds, even though those concepts are not strictly tied to the JVM See for example Clojurescript.

With those things in mind, I wanted to have a mix of trivial and non-trivial examples to explain some concepts, because learning concepts can be much more interesting compared to showing how to do X or Y. At the same time a talk has to be entertaining, so some examples would need to be more fun even if I had to leave more important stuff out of the agenda. This is how I settled on having four small groups of examples - even though some grew quite lare - and leaving some time in the end for a more complete thing - which now I'm realizing that I most likely won't have time to go through.

Also, I've decided to have almost the entire presentation with a fired REPL and writing the code on-the-fly. After all, after all this years, that's the nicest thing about Clojure for me. At the same time I've decided to allow people to interrupt me at any time so we can experiment with whatever comes to their minds.

Below are those four sections with some of the examples I'm planning to show, and some that I will most likely be leaving out, but are worth mentioning here.

And of course all that needs to fit in, more or less, 45 minutes.

Let's go!

Intro

The good thing with Clojure is that explaining the syntax takes just a couple of minutes.

Creating a hello world can be as simple as;

(defn hello
  "says hello to someone"
  [name]
  (println (str "hello " name)))

and calling this function is just:

(hello "world")
hello world
=> nil

It's worth mentioning and explaining the REPL, how Cursive works which I'll be using. Highlighting editor support in general and show some basic functions.

This will most likely be the shortest section.

Data Structures

Even though there were efforts in the Java world to get functional pesistent data structures, it's hard to leave what the SDK provides and what everything is using. Java Streams improved a lot the way to work with collections, but the basic interfaces are still there and those are mutable.

Clojure comes with persistent functional data structures which are immutable with a very flexible and extensive API.

What is particulary interseting about the data structures, which leads very quickly to the value of values is this:

(def list1 (list 1 2 3))

Lists are not the data structure Clojure devs usually work with (does the source code counts?), as that would be the vector, but it's easier to present what follows:

list1
=> (1 2 3)

Let's create a 2nd list by adding 4 to it:

(def list2 (conj list1 4))

Is this a surprise:

list2
=> (4 1 2 3)

The function conj will add an element and get a new data structure. Where the element is added is determined by the nature of the data structure. For linked lists it's more natural to add this item to the front, for vectors it is to the end, and for sets you don't know as those are unordered:

(conj [1 2 3] 4)
=> [1 2 3 4]
(conj #{1 2 3} 4)
=> #{1 4 3 2}

Now if we get back to our original lists. Let's create a 3rd one

(def list3 (conj list1 5))

Now calling the function rest, will give us all the elements but the first:

(rest list3)
=> (1 2 3)

That's expected.

What if we now compare the lists 2 and 3 without the first element:

(= (rest list2) (rest list3))
=> true

That's expected. The = functions compares if the values are equal. For example

(= [1 2 3] [1 2 3])
=> true

That's also expected. But are those two vectors identical?

(identical? [1 2 3] [1 2 3])
=> false

They are not, because those are not the same object. If someone checks under the hood, a reference comparison (Java's ==) will be found.

But:

(identical? (rest list2) (rest list3))
=> true

Aha! Even though we have created two new lists, both still share their structure with list1.

For more about this someone can start digging from section 3.4.1 in the Clojure history paper.

Factorial & Fibonacci

It's time to create something more interesting. Factorial and Fibonacci, as common as they are, are my favorite things to showcase.

Let's start with a simple recursive factorial implementation:

(defn fact [n]
  (if (= n 0)
    1
    (* n (fact (- n 1)))))

Built-in if just joined the party, and there are a number of new guests just outside the door. That works as expected:

(fact 5)
=> 120

Or does it?

(fact 120)
Execution error (ArithmeticException) at java.lang.Math/multiplyExact (Math.java:1032).
long overflow

Ah, that's a big number and thankfully Clojure doesn't allow it to overflow. We can easily fix this, by changing to the *' which supports arbitary precision.

(defn fact [n]
  (if (= n 0)
    1
    (*' n (fact (- n 1)))))
(fact 120)
=>
6689502913449127057588118054090372586752746333138029810295671352301633557244962989366874165271984981308157637893214090552534408589408121859898481114389650005964960521256960000000000000000000000000000N

That works but what is this? Let's check:

(type (fact 5))
=> java.lang.Long
(type (fact 120))
=> clojure.lang.BigInt

As one could expect clojure.lang.BigInt is built on top of java.math.BigInteger.

So that works. Or ... does it?

(fact 12000)
Execution error (StackOverflowError) at clj-intro.demo/fact (demo.clj:4).
null

Unfortunately, our recursive definition is consuming the stack.

How about this:

(defn fact [n]
  (loop [n n
         r 1]
    (if (= n 0)
      r
      (recur (- n 1) (*' r n)))))

That works for 12000. Trust me.

While loop/recur looks like a recursive thing, it's not.

We need to pull clj-java-decompiler that will helps us checking what is going on.

First we need to update our namespace definition:

(ns clj-intro.demo-final
  (:require [clj-java-decompiler.core :as dec]))

Now we can decompile the loop:

(dec/decompile
  (loop [n 5
         r 1]
    (if (= n 0)
      r
      (recur (- n 1) (*' r n)))))

Oh!

// Decompiling class: clj_intro/demo$fn__2355
package clj_intro;

import clojure.lang.*;

public final class demo$fn__2355 extends AFunction
{
    public static Object invokeStatic() {
        long n = 5L;
        Object r = RT.box(1L);
        while (n != 0L) {
            final long minus = Numbers.minus(n, 1L);
            r = Numbers.multiplyP(r, n);
            n = minus;
        }
        return r;
    }
    
    @Override
    public Object invoke() {
        return invokeStatic();
    }
}

Yes, Clojure complies to JVM bytecode, and now are are looking under the hood.

But recur can also work for functions, and not just loops. E.g:

(defn foo [x]
  (when (not= x 0)
    (println x)
    (recur (- x 1))))

And

(foo 5)
5
4
3
2
1
=> nil

So why we didn't recur in our factorial example? I'll leave that as an exercise for the reader.

A more Clojure like way of defining factorial would be:

(defn fact [n]
  (reduce *' (range 1 (inc n))))

It's finonacci time now!

Let's start simple and use the iterate and take functions:

(take 10 (iterate inc 0))
=> (0 1 2 3 4 5 6 7 8 9)

That souldn't provoke any suprices.

What about this:

((fn [[a b]]
   [b (+ a b)]) [3 5])
=> [5 8]

We are defining an anonymous function, invoking it with the typle [3 5], destructuring the params, and returning a new tuple of b and a + b.

Let's connect those two pieces:

(take 10 (iterate (fn [[a b]]
                    [b (+ a b)]) [0 1]))
=> ([0 1] [1 1] [1 2] [2 3] [3 5] [5 8] [8 13] [13 21] [21 34] [34 55])

Aha! Now we are providing a tuple of [0 1] as the iteration seed and generating one new tuple each time.

That's it really. Then we just take the first of the tuples.

(map first (take 10 (iterate (fn [[a b]]
                               [b (+ a b)]) [0 1])))
=> (0 1 1 2 3 5 8 13 21 34)

Let's rewritte it in a simpler way:

(->> [0 1]
     (iterate (fn [[a b]]
                [b (+ a b)]))
     (take 10)
     (map first))

And we are done:

(defn fibo-seq [n]
  (->> [0 1]
     (iterate (fn [[a b]]
                [b (+' a b)]))
     (take n)
     (map first)))

And:

(drop 100 (fibo-seq 105))
=>
(354224848179261915075N 573147844013817084101N 927372692193078999176N 1500520536206896083277N 2427893228399975082453N)

But is it performant?

(drop 1000000 (fibo-seq 1000005))
Exception in thread "nREPL-session-f6706ab3-a5aa-4fcd-8161-1a89974bdd8c" java.lang.OutOfMemoryError: Java heap space
...

It's not. For a performant solution we would need to approach this differently which wasn't originally in the initial scope of this presentation. However, when chatting about this with a friend, he pointed out that it would be nice to present higher order functions.

I had thoughts about this, and my initial reaction was to not include higher order functions. Blasphemy, I know. My main reasoning behind this is that higher order functions are complex, and if someone is not accustomed to functional thinking it is not an easy-to-grasp concept.

I'm still not sure if I'll demo this, but as I said previously, here I can include outtakes!

So here it goes.

Let's get back to our fibonacci example, and create a recursive version.

(defn fibo [n]
  (condp = n
    0 0
    1 1
    (+ (fibo (- n 2)) (fibo (- n 1)))))

The new thing here is the condp macro (more on macros below), which just allows us having sort of switch but really compact. This works, but it's slow. Too slow actually.

(time (fibo 45))
"Elapsed time: 24255.134298 msecs"
=> 1134903170

Can we do better by using memoization?

This example and an explanation of atoms can be found here.

Lets create the memoization higher order function:

(defn memo [f]
  (let [mem (atom {})]
    (fn [& args]
      (if-let [mem-v (@mem args)]
        mem-v
        (let [v (apply f args)]
          (swap! mem assoc args v)
          v)))))

There are a lot of things going on here. So let's break things one by one.

  1. There is an atom hash-map that stores for a given sequence of param args the outcome of f invocation on them.
  2. With the use of lexican scoping the fn that will be returned will enclose the mem binding and it will exist in its context.
  3. This functions takes variable arguments which can be done with & args. Those arguments are provided to us in a array sequence.
  4. We do a lookup in the mem atom for this key and if there is something there we provided it back. Otherwise:
  5. We do function application, getting the result v, store it in the atom and returning the v back.

For people new to Clojure or functional programming this may require some mental effort to digest. But it should be noted, that this is not a trivial example.

Now we can define the memo-fibo by simply:

(def memo-fibo (memo fibo))

And, here it goes, it's way way faster this time:

(time (memo-fibo 45))
"Elapsed time: 0.027833 msecs"
=> 1134903170

That's all good, but will this work?

(time (memo-fibo 1000005))

Spoiler, no, but why?

Just for catharsis, we can once again implement fibonacci with a loop:

(defn fibo [n]
  (loop [a 0
         b 1
         n n]
    (if (= n 0)
      a
      (recur b (+' a b) (dec n)))))

This time it works:

(time (fibo 1000005))
"Elapsed time: 10082.013057 msecs"
=>
a very very long number

Host interop & Polymorphism

The JDK comes with tons of built-in things one may use. For example:

(slurp (new java.io.File "/dev/null"))
=> ""

Well, of course.

A Java engineer should wonder here: What happened to my checked exception?

When using clojure you are not forced to check for checked exceptions.

Checking for exceptions is done with the try special form:

(try
  (/ 1 0)
  (catch ArithmeticException e
    (println "oh no" (.getMessage e))))
oh no Divide by zero
=> nil

Clojure provides many interesting ways to interact with the host language and what it provides. One nice way, is using the doto macro which can help make our code look like the builder pattern:

(doto (java.util.HashMap.)
  (.put "foo" 1)
  (.put "bar" 2)
  (println))
#object[java.util.HashMap 0x2b2c500e {bar=2, foo=1}]
=> {"bar" 2, "foo" 1}

But wait, what is a macro?

As the reference says:

Clojure has a programmatic macro system which allows the compiler to be extended by user code.

I don't intend getting into macros mainly because of time constraints, but it's still interesting to check what does this macro do:

(macroexpand-1 '(doto (java.util.HashMap.)
                 (.put "foo" 1)
                 (.put "bar" 2)
                 (println)))
=>
(clojure.core/let
 [G__2049 (java.util.HashMap.)]
 (.put G__2049 "foo" 1)
 (.put G__2049 "bar" 2)
 (println G__2049)
 G__2049)

By expanding this macro we see that a hash map is contructed and bound to the symbol G__2049. Then there are the .put calls, then a println and in the end the symbol is returned.

OK, let's get back to the hash map. An interesting thing one could observe is that it looks like an actual clojure hash map which can be constructed with the literal form {:a 1}. But does that mean that we can compare it to a Java hash map?

Indeed:

(= {"foo" 1 "bar" 2} 
   (doto (java.util.HashMap.)
     (.put "foo" 1)
     (.put "bar" 2)))
=> true

We can. This is most exciting.

It's time to check polymorphism and even though there are multiple things to show here, my main focus will be showing protocols and ad-hoc implementations of Java interfaces. It's also worth mentioning the expression problem.

Let's assume we want to create a generic mechanism for concatenating things.

Let's create a protocol, which kinda looks like a Java interface:

(defprotocol IConcat
  (ccat [this that]))

This protocol can now be extended to existing types in the running VM.

I'm not using the name concat it already exists in the Clojure core and it will be used below.

We can try to:

(ccat "foo" "bar")
Execution error (IllegalArgumentException) at clj-intro.demo-final/eval2064$fn$G (demo_final.clj:107).
No implementation of method: :ccat of protocol: #'clj-intro.demo-final/IConcat found for class: java.lang.String

The compiler explains that there is no implementation of the method ccat for the String class.

Let's fix that:

(extend-type java.lang.String
  IConcat
  (ccat [this that]
    (str this that)))

Let's try again:

(ccat "foo" "bar")
=> "foobar"

That looks good. Let's try with vectors now:

(ccat [1 2] [3 4])
Execution error (IllegalArgumentException) at clj-intro.demo-final/eval2064$fn$G (demo_final.clj:107).
No implementation of method: :ccat of protocol: #'clj-intro.demo-final/IConcat found for class: clojure.lang.PersistentVector

We can do the same thing as before:

(extend-type clojure.lang.PersistentVector
  IConcat
  (ccat [this that]
    (concat this that)))

And now:

(ccat [1 2] [3 4])
=> (1 2 3 4)

Which alos looks good.

What if we mix things? Such as:

(ccat "foo" [1 2])
=> "foo[1 2]"
(ccat [1 2] "foo")
=> (1 2 \f \o \o)

Again that's something that may look confusing but it's relatively easy to explain. Main thing to take into account here is that the first param to the ccat function is the dispatching type, so in the first case we will be using the str function and in the second, the concat, as in the 1st case we are dispatching on java.lang.String and in the 2nd on clojure.lang.PersistentVector. The later, concatenates collections and a String is just a collection of characters.

It's time to check some Java Streams interactions. Let's see:

(-> (java.util.stream.Stream/of (to-array ["12.43" "1.123" "0.54"]))
    (.map (reify java.util.function.Function
            (apply [this x]
              (parse-double x))))
    (.collect (java.util.stream.Collectors/toList)))
=> [12.43 1.123 0.54]

I know that people can get upset with the fully qualified names, so let's do some imports by updating our namespace declaration:

(ns clj-intro.interop
  (:import (clojure.lang IPersistentCollection)
           (java.util.function Function)
           (java.util.stream Collectors Stream)))

Now:

(-> (Stream/of (to-array ["12.43" "1.123" "0.54"]))
    (.map (reify Function
            (apply [this x]
              (parse-double x))))
    (.collect (Collectors/toList)))
=> [12.43 1.123 0.54]

This looks much better. The key takeaway here is that we can use reify to implement Java interfaces in an ad-hoc way.

Closing thoughts

I hope this was instructive and fun. Even if I have gone through some of those examples many times, I still find it interesting and entertaining to go through them again. And each time I go through them I'm finding things I didn't know e.g. what type is & args?

I was probably over ambitions to also include a more end-to-end examples i.e. building something that actually does something, so I'll keep those as pointers here to potential next blog posts:

clojure meetup