Real world functional programming in Scala - comparing Java, Clojure and Scala

I recently started reading Stuart Halloway's book 'Programming in Clojure'. I don't think I will be writing much enterprise applications in that language in the near future, but it never hurts to broaden the mind, and it's a very good read. In his book, he demonstrates some of the advantages of functional programming by taking an example from the Apache commons library: StringUtils.indexOfAny. He has also written a blog about it.
In this blog post, we'll compare the original function in Java, the Clojure version and a Scala implementation.

To start: here's the original implementation in Java:

// From Apache Commons Lang, http://commons.apache.org/lang/
public static int indexOfAny(String str, char[] searchChars) {
  if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
      return -1;
  }
  for (int i = 0; i < str.length(); i++) {
      char ch = str.charAt(i);
      for (int j = 0; j < searchChars.length; j++) {
        if (searchChars[j] == ch) {
            return i;
        } 
      }
  }
  return -1;
}

Sample results of this function are:

StringUtils.indexOfAny(null, *)                  = -1
StringUtils.indexOfAny("" , *)                    = -1
StringUtils.indexOfAny(*, null)                  = -1
StringUtils.indexOfAny(*, [])                    = -1
StringUtils.indexOfAny("zzabyycdxx" ,['z' ,'a' ]) = 0
StringUtils.indexOfAny("zzabyycdxx" ,['b' ,'y' ]) = 3
StringUtils.indexOfAny("aba" , ['z' ])            = -1

And now the Clojure code. I've taken the version from the book, (free to download here). The book version defines three functions, instead of the version shown in the blog post.


(defn indexed [s] (map vector (iterate inc 0) s))

(defn index-filter [pred coll]
  (when pred
    (for [[idx elt] (indexed coll) :when (pred elt)] idx)))

(defn index-of-any [pred coll]
  (first (index-filter pred coll)))

The three functions defined are:

  1. indexed returns a sequence of pairs, the first element being the index, the sequence the element.
  2. index-filter returns a sequence of the indexes of matching predicates in the collection. For example, the function call (index-filter #{\a \b} "abcdbbb") yields (0 1 4 5 6)
  3. index-of-any is the StringUtils function. This is now a simple function, just returning the first result of the index-filter function.

As Stuart has mentioned in his book, the Clojure version is shorter, less complex, and also more general than the Java version. No special case handling, such as null or empty search character checking. This all just works naturally in the functional version. It can also be easily extended for other functions than just indexOfAny
Since I've been implementing all kinds of nice, but useless little functions in Scala recently, I thought it might be a good idea to try out a more useful example. So let's see how well we can do with Scala in implementing this in a similar way as the Clojure version.
Almost immediately, we hit on the dreaded null problem. Scala is not a pure functional language, and you have to beware that parameters that are passed on can be null. You could specify the input arguments as Options, of course, but than the caller of the function still needs to construct either the Some or None value, and to do that the caller would have to check for it being null or not.
So, after spending some time grumbling about this issue, I decided to admit defeat and do the null check. 1-0 for Clojure.

Without further ado, here are the three functions implemented in Scala (N.B. I've used the trunk here, in case you might want to try this out yourself, I haven't tried whether this also works version 2.7.5):

def indexed(coll:Seq[Any]):Seq[Pair[Int,Any]] = {
  if (coll==null) List()
  (0 until coll.length).zip(coll)
  //below an alternative, but here we have to traverse the list another time to reverse the elements...
  //coll.zipWithIndex.map(p => p.swap)
}

def indexedFilter(pred:Seq[Any], coll:Seq[Any]):Seq[Int] = {
  if (pred==null)List()
  val idxList = indexed(coll)
  for (p <- pred; idxPair <- idxList; if (p == idxPair._2)) yield (idxPair._1)
}

def indexOfAny(pred:Seq[Any], coll:Seq[Any]):Option[Int] = {
  val res=indexedFilter(pred, coll)
  if (res.isEmpty) None
  else Some(res.first)
}

Following the definition of the indexed function is a no-brainer: Scala's zip function on the List class does the job here. There's also zipWithIndex which does almost what we want, this returns a List of tuples, but in reverse order as the Clojure sample: element first, index later.
The indexedFilter uses a for-comprehension just as the Clojure example (Clojure's for is not a traditional loop, but a sequence comprehension). For each element in our predicate sequence it checks whether it is present in the indexed tuples that is the result of the indexed function. If it is, it is added to the resulting list using yield. indexOfAny then becomes a trivial function that just checks whether the resulting sequence is empty or not. Instead of returning -1 if nothing is found, I've used Scala's Option class in this case to make the result more expressive.

The function at least works as expected:


scala> indexOfAny("ab", "cdef")
res3: Option[Int] = None
scala> indexOfAny("ab", "bcdefbbb")
res3: Option[Int] = None
scala> indexOfAny("fb", "bcdefbbb")
res4: Option[Int] = Some(4)

Conclusion
All in all, the scala version is not that bad. Its main drawback over the Clojure version is the two null checks that we need to do, and also we need to check the resulting list being empty or not in the indexOfAny function. Clojure wins in this respect. However, the Scala version still is pretty neat and generic and improves (in my opinion at least) the original Java version.

Update: handling infinite collections
Stuart asked the question in his comment how the Scala version would handle infinite collections. The version presented above does not handle this very well. For instance, you'll get a OEM if you try to pass on Stream.from(0) (which constructs a lazy infinite list of all natural numbers) to the indexed function. Also the indexedFilter function evaluates the entire collection, which is strictly not necessary for the indexOfAny function. To handle an infinite collection, we need to modify and optimize the functions somewhat, to resemble a bit more the functions that Stuart has used in his blog post, instead of the book versions. The Clojure functions used in the blog post look like this:


(defn indexed [s] (map vector (iterate inc 0) s))

(defn index-of-any [s chars]
  (some (fn [[idx char]] (if (get chars char) idx)) 
          (indexed s)))

The Scala version that I came up with is as follows:

def lazyIndexed(coll:Seq[Any]):Seq[Pair[Int,Any]] = {
  if (coll==null) Stream.empty
  Stream.from(0).zip(coll)
}

def lazyIndexOfAny(pred:Seq[Any], coll:Seq[Any]):Option[Int] = {
  if (pred==null) None  
  lazyIndexed(coll).dropWhile(ip => !pred.exists(p => (p == ip._2))).headOption.map(ip => ip._1)
}

As said, Scala's Stream implements a lazy list, in which the elements are only evaluated when needed. In this case, we keep on drop elements from it list that do not appear in the predicate sequence. The head of the remaining list is the first matching result.

Obvious drawback here is that this indeed does works on infinite collections, but only if a match can be found, else we're still in trouble.

Comments (17)

  1. Mohammed Berdai - Reply

    July 5, 2009 at 12:09 am

    Yep, I came to the same conclusion 🙂

  2. Stuart Halloway - Reply

    July 5, 2009 at 2:31 am

    Glad you enjoyed the book. Is there any chance you can reformat the Clojure example to indent correctly? Also, the book includes JavaScript you can use to syntax-color Clojure code if you want.

    • Arjan Blokzijl - Reply

      July 5, 2009 at 5:22 am

      I certainly did enjoy it (and more and more so, moving towards the STM chapter), so many thanks for having written this book. I've modified the Clojure snippet, so that it indents at least slightly better than before. Still not perfect though, so I'll have a look into the JavaScript for proper syntax-coloring.

  3. Alexander Azarov - Reply

    July 6, 2009 at 12:36 pm

    It is possible to simplify Scala version a little bit using one helper function:

    def opt[T](x: Seq[T]): Seq[T] = if (x == null) List() else x

    def indexed[T](coll: Seq[T]): Seq[(Int, T)] = {
    val c = opt(coll)
    List.range(0,c.length).zip(c.toList)
    }

    def indexedFilter[T](pred:Seq[T], coll:Seq[T]): Seq[Int] = for (p <- opt(pred); idxPair <- indexed(coll); if p == idxPair._2) yield idxPair._1

    def indexOfAny[T](pred: Seq[T], coll:Seq[T]): Option[Int] = indexedFilter(pred, coll).firstOption

    This compiles and runs under Scala 2.7.5. Scala 2.8 will hopefully provide object method Option.apply(x) which will test x (it's marked as @experimental right now)

  4. Asd - Reply

    July 8, 2009 at 12:09 pm

    Clojure surely can get nulls from Java methods. Are they represented as nil? Is nil handled automatically when working with collections in clojure?

    Personally I would omit the null checks in Scala. If somebody is passing nulls around it is their own fault if the get a NPE. In fact, they should get a NPE from any function that is not documented to handle nulls.

  5. Asd - Reply

    July 8, 2009 at 12:27 pm

    The Scala could definitely be shorter. I am omitting the null checks (you didn't put any type checks in the clojure did you?):

    def indexedFilter(pred:Seq[Any], coll:Seq[Any]) =
    coll.toList.zipWithIndex.filter(p => pred.contains(p._1)).map(p => p._2)

    def indexOfAny(pred:Seq[Any], coll:Seq[Any]) = indexedFilter(pred,coll) match {
    case x::xs => Some(x)
    case _ => None
    }

  6. Asd - Reply

    July 8, 2009 at 1:07 pm

    D'oh, my Scala could be even shorter:

    def indexOfAny(pred:Seq[Any], coll:Seq[Any]) = indexedFilter(pred,coll).firstOption

    • Arjan Blokzijl - Reply

      July 11, 2009 at 7:48 am

      @ASD: You're correct, that makes the Scala version more concise, and demonstrates my mediocre knowledge of the Scala collections API. As a side note, in the redesigned collections library in the trunk, firstOption is deprecated and its now headOption. About the null checks: in the Clojure version you can pass nil to the function, and the function still works. One of the reasons the apache commons libraries are widely used is that they handle the annoying null checking. I've tried to re-create the three Clojure functions in Scala, and in order to have all cases work correctly, the null checking is still required I believe.

  7. Cay Horstmann - Reply

    July 9, 2009 at 2:09 am

    This is an interesting exercise, but it might scare away Java refugees who think that everything in Scala must be rethought with a mess of predicates and zipped lists. This particular algorithm translates easily into something that a Java refugee can grasp pretty quickly:

    def indexOfAny(str : String, searchChars : Array[Char]) =
    if (str == null || searchChars == null) List() else
    for { i <- 0 until str.length if (searchChars.contains(str(i))) } yield i

    It collects all matching indices, not just the first one.

  8. Eastsun - Reply

    July 9, 2009 at 2:11 am

    my scala solution(just a direct translation from the java version)
    scala> def indexOfAny(str :String, seq :Array[Char]):Option[Int] = {
    | if(str == null || seq == null) None
    | else (0 until str.length) find {i => seq.contains(str(i)) }
    | }
    indexOfAny: (String,Array[Char])Option[Int]

  9. Stuart Halloway - Reply

    July 11, 2009 at 4:56 am

    Cay, Eastsun,

    The direct translations are concise, but they lack the generality of the Clojure version. How would you code a Scala version to handle an infinite collection?

    • Arjan Blokzijl - Reply

      July 12, 2009 at 7:00 am

      Hi Stuart,

      My first version also doesn't handle infinite collections that well, I've posted an update to the blog with a version that does some rather crude handling of this using Scala's Stream class, which implements a lazy list. This version is less generic than the first one, however.

  10. Alexander Azarov - Reply

    July 14, 2009 at 12:24 pm

    I guess you have a mistake in the update. When you don't write "else", "if" returns Unit.

  11. Anders - Reply

    July 16, 2009 at 1:10 pm

    I have read tutorial how to write code more functional, i like both Scala and Clojure, but to get developer hocked-on functional development it could be easy way to show how to do it, with the language they all ready use.
    IBM have a good introduction.

    Functional programming in the Java language
    http://www.ibm.com/developerworks/java/library/j-fp.html

    mvh Anders

  12. Alexey Tarasevich - Reply

    July 17, 2009 at 1:18 pm

    I think, while you are not using domain-specific macros your Clojure will not look remarkably better then Scala 🙂

  13. Eddie Stanley - Reply

    September 5, 2011 at 8:22 am

    There is visibly a lot to identify about this. I feel you made some good points in features also.

  14. StreamBase and Scala | StreamBase - Reply

    October 12, 2011 at 6:28 pm

    [...] who follows programming languages will probably have noticed Scala cropping up a lot [...]

Add a Comment