On the mysteriously fast Spray-can web-server

Wilco Koorn

I am addicted to a problem: handling unknown peak load on the net. Part of the solution I have in mind involves, of course, a fast web-server. One of the fastest around is Spray-can (see https://github.com/spray/spray-can) and I really like the thing for several reasons I won’t explain here. Anyway, I’m sure you can guess my very first question by now:

How fast is Spray-can really?

So I’m after assessing the speed of Spray-can. I can believe the guys from spray.io who tweeted (my co-worker Raymond Roestenburg pointed this out to me, thanks!):

Screen shot 2013-08-02 at 1.52.25 PM

Needless to say just believing these guys is absolutely no fun. Here is what I did to figure it out myself.

The Server.
I wrote this bit of scala to obtain a response that is easy for counting:

   case HttpRequest(GET, "/dispatcher", _, _, _) =>
      counter = counter + 1;
      sender ! HttpResponse(entity = counter.toString())

Let’s see if this works. It does! First request (http://localhost:8080/dispatcher) gives ‘1’, next gives ‘2’, then ‘3’. Cool!

The Client
Over the last months I have used several techniques for the Client. I started with JMeter and I blew up JMeter, not Spray-can. Then I wrote a really mean low level client in java, used thousands of threads and got results that I still do not understand. I might get into that in a later blog. Last tuesday I told my co-worker Joris de Winne and he asked why don’t you just use the ‘wget’ Unix command. So we hacked up this experiment the same day.

First experiment: one Mac

On my Mac there is no ‘wget’, so we used ‘curl’ but that’s a detail. We used two little shell scripts the first one (“testit.sh”) does curl just calls the server. It looks like this:

#!/bin/sh
curl http://localhost:8080/dispatcher 2>&1 > /dev/null

And a second one for making our live easy. I put the first script in an endless loop and start the loop 30 times in the background. Like this:

#!/bin/sh
while [ "" = "" ] ; do ./testit.sh 2>&1 > /dev/null; done &
while [ "" = "" ] ; do ./testit.sh 2>&1 > /dev/null; done &
while [ "" = "" ] ; do ./testit.sh 2>&1 > /dev/null; done &
<snip>
while [ "" = "" ] ; do ./testit.sh 2>&1 > /dev/null; done &
while [ "" = "" ] ; do ./testit.sh 2>&1 > /dev/null; done &

The mysterious result

Note I’m running Spray-can as well as my test scripts on the same machine (2Ghz Intel core i7)
This is what a see. A repeating pattern that shows a CPU bound process that drops frequently to almost no CPU usage at all.... Green is CPU power used by user processes, Red is CPU power used by the system.

Screen shot 2013-08-02 at 10.57.45 AM

But why this pattern???? I hooked up JConsole to see if this was garbage collection firing. Nope! I could really use your help here. Are my Akka Actors collapsing and being put put back in the air by supervising Actors???? And it gets more mysterious in a bit when we run our second experiment. Hang in.

Throughput

I let my experiment run for 5 minutes and then used a browser to see how many requests were handled. There were 161447 requests. So that is a throughput of about 538 req/sec.

Second experiment: two Macs

Today I used my wife’s Mac to run the clients on. It is a somewhat older machine and not as powerful (2.53 Ghz Intel Core 2 Duo) compared to my own. As we now have two machines there is obviously a network in between and I got myself some UTP cables to make the communication as fast as possible. That appeared unnecessary! Just using the wifi I saw this:

Mac running the clients:

Schermafbeelding 2013-08-02 om 13.04.24

Mac running Spray-can:

Screen shot 2013-08-02 at 1.04.14 PM

The machine running the clients is clearly CPU bound like we saw before. And the machine running Spray-can behaves as expected. But where is my repeating pattern now? I haven’t the foggiest....

Throughput

I saw 44697 requests in 5 min which is about 150 req/sec.

So now what?

I plan to organize a “Please Break My System” session with all my co-workers. I’ll allow any technique used for the clients except that they have to use their laptops as using servers in the cloud is no fun. Watch this space.

Comments (10)

  1. Jeroen Leenarts - Reply

    August 2, 2013 at 6:47 pm

    If I were you I'd suspect you're I/O constraints. Especially the way you are running the shell scripts, I think there's a ton of internal housekeeping going on. To really dig into this, I'd look into using DTrace and/or Instruments on a Mac. That will give you deeper insight.

    My guess with the bumping CPU graph you're seeing is that the shell scripts are amassing a lot of requests, which are at some point handled your server. Apparently the Spray-Can does this with way lower CPU overhead compared to Curl.

    Remember that looping that Curl call is a child process on every iteration. So the CPU is more busy with creating and cleaning processes than anything else. Maybe some Java.nio would do the trick.

    • Jeroen Leenarts - Reply

      August 2, 2013 at 6:54 pm

      Also look into Apache Benchmark, installed by default on a Mac.
      Command-line:
      ab -c 10 -n 1000 http://localhost:8080/dispatcher

      -c is the number of concurrent requests.
      -n The total number of requests to fire.

      Much lower overhead compared to that curl thing you did. ;)

  2. Age Mooij - Reply

    August 2, 2013 at 7:24 pm

    Jeroen, AB would normally be a very good suggestion but unfortunately AB is severely broken on OSX and this will lead to very strange behaviour. Even if you install a newer version via homebrew it will not work correctly if you don't use persistent connections (`-k` switch).

    Try one of the other well known load generators, like 'weighttp', 'httperf' or 'wrk' (which are all available via homebrew).

    • Jeroen Leenarts - Reply

      August 4, 2013 at 10:52 pm

      Works on my machine... (which is pre-release 10.9)

      Should I look for specific behavior to detect this broken-ness?

  3. Age Mooij - Reply

    August 2, 2013 at 7:26 pm

    Wilco, you are linking to an ancient and deprecated version of spray-can. The latest, much faster version is part of the main spray project at http://github.com/spray/spray

    Did you also run your tests against the old version of spray-can?

  4. Age Mooij - Reply

    August 2, 2013 at 7:45 pm

    Have a look at the standard server-benchmark example that comes with spray-can: https://github.com/spray/spray/tree/master/examples/spray-can/server-benchmark

    Try the following weighttp command:

    weighttp -n 100000 -c 100 -t 4 -k "http://localhost:8080/"

    When I run this against the server-benchmark project on my Retina MBP without any special configuration, I get around 45k requests/second. If I run it against "http://localhost:8080/json" I get around 71k requests/second.

  5. Tomasz N. - Reply

    August 5, 2013 at 5:08 pm

    Spawning curl/wget or any blocking tool like JMeter might consume most of computer resources like CPU, I/O, RAM etc. Consider http://gatling-tool.org/ for HTTP benchmarking. Interestingly, just as Spray, it's Akka based and non-blocking.

  6. wilco koorn - Reply

    August 6, 2013 at 9:36 am

    @all: thanks for the hits to other benchmarking tools. I had a quick look at 'ab' yesterday and I can confirm Age's remarks. I get very spurious results and the thing seems buggy to me too. I'll certainly have a look at 'weighttp' and 'gatling'. Also 'The Grinder' is on my list (http://grinder.sourceforge.net/).
    @Age: I ran 'spray-can/1.1-M7' during my experiments.

    But for me the most important question still stands: what is going on in my first experiment that repeatedly shows poor performance?

    • Age Mooij - Reply

      August 7, 2013 at 11:09 am

      The M8 release of Spray is significantly faster since it is based on the new actor-based Akka IO core.

      That being said, M7 was already pretty damn fast and should also easily do many tens of thousands of requests per second so IMHO your extremely low results are still mostly caused by inefficient load generation.

      I think your focus on explaining a weird CPU usage pattern that you observed just once on a heavily overloaded system is... interesting :)

      • wilco koorn - Reply

        August 7, 2013 at 4:25 pm

        @Age: I promise to re-run tests soonish. And yes, interesting isn't ;-)
        And I've seen a throughput of 12679 req/sec using JMeter (while JMeter was the bottleneck) so yes, I already knew its fast ;-)

Add a Comment