Google Code Jam 2009 - Qualification

Aliens sending messages, Water flowing over a map and finding the hidden Welcome message in a String... Yes, Google Code Jam has returned for the 2009 edition! I participated in the Qualification Round and managed to solve all but 1 input set....

Sounds like fun? It sure was! As I have been busy learning some Scala, I thought I'd try that and see how it worked out during the contest. I think it saved me at least some lines of code.

For the first problem, Google has intercepted some alien communications but the signal has weakened so much that some of the words are unclear. They now need our help to devise a clever algorithm that can tell them how many different words the signal can represent. For this they constructed the full dictionary of the alien language. As a sample they provided us with the following alien language which consists of 5 3-letter words, and 4 received signals.

3 5 4
abc
bca
dac
dbc
cba
(ab)(bc)(ca)
abc
(abc)(abc)(abc)
(zyx)bc

For my solution I decided to read the dictionary of all the words, and subsequently convert each of the signals to a regular expression by doing

val pattern = "(ab)(bc)(ca)".replaceAll("\", "]").r

This lead to the following code:

object AlienLanguage {
  
	def main(args: Array[String]) = {
		if (args.length == 0) {
			throw new IllegalArgumentException("Could not find input file in arguments")
		}
  
		val input = args(0)
		val outputFile = new File((input substring(0, input lastIndexOf("."))) + ".out")
		val reader = Source fromFile(input) getLines
		val ar = reader nextIntArray
		val (tokens, nrWords, nrProblems) = (ar(0), ar(1), ar(2)) 
		val results = new Array[String](nrProblems)
  
		val words = (1 to nrWords).map(x => reader.trimmedLine).toList
		for (val i <- 1 to nrProblems) {
			results(i - 1) = "Case #" + i + ": " + solveProblem(reader, words)
		}
		outputFile write results
	}

	def solveProblem(reader : Iterator[String], words : List[String]) : String = {
		val pattern = reader.trimmedLine
		val regex = pattern.replaceAll("\", "]").r
  
		matchWords(regex, words.head, words.tail).toString
	}

	def matchWords(regex : scala.util.matching.Regex, word: String, words : List[String]) : Int = {
		words.isEmpty match {
		  	case true => matchWord(regex, word)
		  	case false => matchWord(regex, word) + matchWords(regex, words.head, words.tail)
		}
	}

	def matchWord(regex : scala.util.matching.Regex, word: String) : Int = {
		regex.findFirstIn(word) match {
		  case Some(x) => 1
		  case None => 0
		}
	}
}

The code uses two helper classes which use implicits to add methods to a File and to an Iterator[String]. These can be found in my GitHub repository (see bottom). This code was able to solve the large input set (max. 5000 15-letter words and 500 signals) within 2,5 seconds.

For the second problem we were given an elevation map for which we had to calculate drainage basins, based on a a set of rules:
* From each cell water flows to at most one neighbour cell
* Water always flows to the lowest neighbour, in case of a tie, water prefers North, West, East, South in this order
* If no neighbour has a lower altitude, water stays in this cell, we call this a sink
* drainage systems are identified by a single unique lowercase letter, such that when all rows are concatenated, the string is lexographically smallest.

A sample input with 1 5x5 map
1
5 5
1 2 3 4 5
2 9 3 9 6
3 3 0 8 7
4 9 8 9 8
5 6 7 8 9

My solution consists of first building the map and then converting that to a "flow map", in which every cell points to the index of the cell to which the cell drains. This flow map can then easily be transformed to a "basin map" by starting in the top-left corner and going from left-to-right and top-to-bottom through the map, assigning a unique letter to the drainage system. The code is:

object Watersheds extends CodeJam {
 
	def solveProblem(reader : Iterator[String]) : String = {
		val ar = reader.nextIntArray
		val (h, w) = (ar(0), ar(1))
  
		val theMap = (1 to h).map(x => reader.nextIntArray).toArray
  
		val shed = new Shed(h, w, theMap)
		val basins : Array[String] = shed.solve
		printRecursive(basins, w)
	}

	def printRecursive(basins : Array[String], length : Int) : String = {
		basins.size match {
		  	case 0 => ""
		  	case _ => "\n" + basins.take(length).reduceLeft(_+" "+_) + printRecursive(basins.drop(length), length)
		}
	}
}

class Shed(val h : Int, val w : Int, val theMap : Array[Array[Int]]) {
	val flowMap = (for (hPos <- 0 until h; wPos <- 0 until w) yield determineFlowTo(hPos, wPos)).toArray
	val basins : Array[Int] = Array.make(h * w, -1)

	def determineFlowTo(hPos : Int, wPos : Int) : Int = {
		val north = determineAltitude(hPos - 1, wPos)
		val west = determineAltitude(hPos, wPos - 1)
		val east = determineAltitude(hPos, wPos + 1)
		val south = determineAltitude(hPos + 1, wPos)
		val me = determineAltitude(hPos, wPos)
		val minAltitude = List(me, north, west, east, south).sort(_<_).head
  
		if (me > north && north == minAltitude) {
			convertToIndex(hPos - 1, wPos)
		} else if (me > west && west == minAltitude) {
			convertToIndex(hPos, wPos - 1)
		} else if (me > east && east == minAltitude) {
			convertToIndex(hPos, wPos + 1)
		} else if (me > south && south == minAltitude) {
			convertToIndex(hPos + 1, wPos)
		} else {
			convertToIndex(hPos, wPos)
		}
	}

 	def determineAltitude(hPos : Int, wPos : Int) : Int = {
		if (hPos == h || hPos < 0) {
			10001 // more than max altitude in large map because out of range
		} else if (wPos == w || wPos < 0) {
			10001 // more than max altitude in large map because out of range
		} else {
			theMap(hPos)(wPos)
		}
	}

	def convertToIndex(hPos : Int, wPos : Int) = {
		hPos * w + wPos
	}

	/*
     * Solve methods	
	 */
	def solve() : Array[String] = {
		val basinString = "abcdefghijklmnopqrstuvwxyz".split("")
		var currentHighestBasin = -1
		(0 until h * w).foreach(i => if (basins(i) == -1) currentHighestBasin = fillFlow(i, currentHighestBasin))
		basins.map(i => basinString(i + 1)).toArray
	}

  	def fillFlow(index : Int, currentHighestBasin : Int) : Int = {
		val flowTo = flowMap(index)
		if (basins(flowTo) != -1) {
			basins(index) = basins(flowTo)
			currentHighestBasin
		} else {
			val basinValue = findBasinValue(index)
			if (basinValue != -1) {
				fillTheFlow(index, basinValue)
				currentHighestBasin
			} else {
				fillTheFlow(index, currentHighestBasin + 1)
				currentHighestBasin + 1
			}
		}
	}

	def findBasinValue(index : Int) : Int = {
		flowMap(index) match {
		  	case `index` => basins(index)
		  	case _ => findBasinValue(flowMap(index))
		}
	}

	def fillTheFlow(index : Int, basin : Int) {
		basins(index) = basin
		val flowTo = flowMap(index)
		if (flowTo != index && basins(flowTo) == -1) fillTheFlow(flowTo, basin)
	}
}

Note that the CodeJam trait which is extended provides a main method, much alike the one from the first problem. The output that is generated by the code looks for the sample input given above as
Case #1:
a a a a a
a a b b a
a b b b a
a b b b a
a a a a a

The program was able to solve both the large and small datasets in under 1 second.

The last problem was (at least for me) a hard nut to crack, though the small input was easy to solve, my program didn't finish the large input within the scheduled 8 minutes. The problem statement was as follows:
Given a text string, you are to determine how many times the string "welcome to code jam" appears as a sub-sequence of that string. In other words, find a sequence s of increasing indices into the input string such that the concatenation of input[s[0]], input[s[1]], ..., input[s[18]] is the string "welcome to code jam".

My approach consisted of the following steps:

  • first filter out all characters from the string that do not belong to the "welcome to code jam" message
  • do a simple kind of run-length-encoding trick to decrease the length of the string even further (while maintaining the information)
  • loop over the string and calculate the number of occurrences
    • object WelcomeToCodeJam extends CodeJam {
      	def solveProblem(reader : Iterator[String]) : String = {
      		val expr = "welcome to code jam"
      		val line = reader.trimmedLine
      
      		// first filter all characters not occurring in pattern
      		val filteredLine = makeHisto(line.toList.filter(x => expr.contains(x)))
      		val exprList = expr.toList
      		(solve(filteredLine, exprList) % 10000) formatted ("%04d")
      	}
      
      	def makeHisto(line : List[Char]) : List[(Char, Long)] = {
      		if (line.isEmpty) {
      			List[(Char, Long)]()
      		} else {
      			val h = histoLetter(line.tail, line.head)
      			val newLine = line.drop(h._2.toInt)
      			newLine.isEmpty match {
      			  case true => List(h)
      			  case false => List(h) ++ makeHisto(newLine)
      			}
      		}
      	}
      
      	def histoLetter(line : List[Char], c : Char) : (Char, Long) = {
      		(c, (line takeWhile (_ == c)).size + 1)
      	}
      
      	def solve(line : List[(Char, Long)], expr : List[Char]) : Long = {
      		var count : Long = 0L
      		if (!expr.isEmpty) {
      			var restLine = findNext(line, expr)
      			while (!restLine.isEmpty) {
      				count += (restLine.head._2 * solve(restLine.tail, expr.tail))
      				restLine = findNext(restLine.tail, expr) 
      			}
      		} else {
      		  count = 1
      		}
      		count
      	}
      
      	def findNext(line : List[(Char, Long)], expr : List[Char]) = {
      		line.dropWhile(_._1 != expr.head)
      	}
      }
      

      Now, given the fact that some people actually managed to do this in a very small amount of time, my guess is that I completely missed something which is why it couldn't finish the large input set (lines of 500 characters). The small input set was correctly generated however. I've been busy this last day refactoring the third program to improve its speed, the latest version is the one shown above, who can tell me what I missed, or how it can be done faster?

      For the code of the 2009 edition, and some of the problems of the 2008 edition, you can also visit my GitHub repository at: http://github.com/hierynomus. Who else competed, or who has a better solution in Scala?

Comments (5)

  1. Maxim - Reply

    September 4, 2009 at 5:04 pm

    you can download solution from any of participant on Google Jam Web side (go to score board and check one check box on the top). Top score devs have amazing solutions, small and fast

  2. Oz - Reply

    September 6, 2009 at 1:23 pm

    the best known complexity is O(N.|L|). N is sum of testcases where |L| is average length of inputs. Approached by dynamic programming, with known facts:

    > array length for DP is 19 (the length for "welcome to code jam") int arr[] = new int[19];
    > to see how many subsequence "we" encountered we can count from previous "w" and so on by adding arr[i] += arr[i-1]
    > the solution will be at the last index

  3. Fabiano Guazzelli da Silva - Reply

    September 9, 2009 at 12:49 am

    Take a look here: http://code.google.com/codejam/contest/dashboard?c=90101#s=a&a=2

    • Jeroen van Erp - Reply

      September 10, 2009 at 8:17 am

      Yeah, I should've taken the memoization approach :-), or do some dynamic programming. Seems like these techniques fade away when not oft used in enterprise development 😉

  4. [...] Code Jam 2009 and managed to solve a few problems. I have written about those in a post on the Xebia blog. The source code is also available in my github repository. In the coming weeks I’ll probably [...]

Add a Comment