GCD stands for Grand Central Dispatch and is Apple's revolutionary approach to multicore computing.
While writing GCDAsyncSocket I was amazed at the performance benefits I was seeing. But I was anxious to see how it would perform in a real-world setting. (You know, something that does more than just pure socket IO, like perhaps some data processing and file IO.) And I couldn't think of a better way to demonstrate the technology than by updating the existing CocoaHTTPServer project.
I'm happy to announce the release of CocoaHTTPServer version 2, which is entirely GCD based. (And yes, it runs on both iOS and Mac OS X.)
For those of you who may not be familiar with GCD, I'm happy to say that it only took me a few days to complete the upgrade. GCD has been a pleasure to work with, and it's really quite easy once you get the hang of it.
But more importantly, I wanted to share the before-and-after benchmarks!
Benchmarks Setup:
I used the apache benchmark tool to perform the tests. This was easy to do since it's installed by default on your Mac. I then followed some basic guidelines on HTTP server benchmarking.
The ab tool will spit out a bunch of information when the test is complete. Here's an example (truncated for simplicity):
$ ab -n 1000 -c 1 http://localhost:12345/index.html
Concurrency Level: 1
Time taken for tests: 1.556 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 2742000 bytes
HTML transferred: 2642000 bytes
Requests per second: 642.48 [#/sec] (mean)
Time per request: 1.556 [ms] (mean)
Time per request: 1.556 [ms] (mean, across all concurrent requests)
Transfer rate: 1720.40 [Kbytes/sec] received
This test requested the "/index.html" file 1,000 times. The concurrency level of 1 means it only did one request at a time. So each request is back-to-back, and the server only has to handle one request at a time.
What's really interesting for an HTTP server is how it performs under a load. In other words, if a bunch of users start making requests at the same time, can your server handle it? Or will it get unbearably slow? We can test this by increasing the concurrency level of the ab tool:
$ ab -n 1000 -c 100 http://localhost:12345/index.html
Concurrency Level: 100
Time taken for tests: 1.196 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 2742000 bytes
HTML transferred: 2642000 bytes
Requests per second: 836.29 [#/sec] (mean)
Time per request: 119.575 [ms] (mean)
Time per request: 1.196 [ms] (mean, across all concurrent requests)
Transfer rate: 2239.37 [Kbytes/sec] received
This time we told the ab tool to make up to 100 requests at a time. So now the HTTP server has to handle 100 requests simultaneously.
Now you might notice that the second test actually took less time overall than the first test. (1.196 seconds vs 1.556 seconds) This makes sense since the ab tool was able to make multiple requests at the same time. But what we're most interested in is the impact it will have on our users. If you request a page from this HTTP server, how long will you have to wait before the server sends back it's response? How long will you be sitting there staring at the progress indicator on your browser? This is the "Time per request (mean)" field, and this is what I'll be reporting.
Benchmark Results:

Click image for full-size version.
I performed tests against 5 different configurations:
- The new GCD version
- The old version - using a single thread
- The old version - using a thread pool of size 1
- The old version - using a thread pool of size 2
- The old version - using a thread pool of size 3
The thread pool versions come from the MultiThreadedHTTPServer sample code (available in the repository). These versions would accept connections on a server thread, and then move the client connections to the thread pool. So almost all the work is done within the thread pool.
These tests were performed on a MacBook Air with a Core 2 Duo processor. In other words, a dual core processor.
The benchmarks of the old version came out exactly as one would expect.
The single threaded server sets the baseline. The thread pool size of 1 performs almost exactly the same. The small benefit comes from the fact that it uses a separate thread to accept incoming client connections, but overall they perform very closely.
Since we're doing the test on a machine with 2 processor cores, you might imagine that you'd get better performance by splitting the client load across 2 threads. And this is exactly what we see with the thread pool size of 2.
When we increase the thread pool size to 3, our performance degrades significantly. This is because we only have 2 processor cores, but we're trying to force the connection load across 3 threads. So now the application spends a significant amount of time switching between threads, and the cost of these thread context switch operations start to chip away at our performance.
And the GCD version? It's around twice as fast across the board!
In addition, these tests were performed without any other applications running on the machine. But now imagine if there had been. What if we were listening to iTunes or watching a compressed movie file? There is no guarantee that the operating system will give us both processor cores. You saw the performance degradation that occurred when we tried to use 3 threads on a 2 core system. So what would happen if we were using 2 threads, and the OS decided to only give us a single CPU core because it was using the other core to decode a video file? GCD solves this problem for us by automatically scaling thread usage based on system load!
Critics and Naysayers:
When I mentioned to a friend that I was updating the CocoaHTTPServer to use GCD, his response was:
Aren't you fixing something that's not broken? This isn't Apache. It's a HTTP server meant to be embedded in Mac and iOS applications. How often do one of these applications have to worry about 400 concurrent users?
Now his response is largely based on the fact that he's never used GCD before, and he's allergic to new technologies. But I think it's worth addressing this concern.
First, let me zoom way in on that benchmark from before:

Click image for full-size version.
When I said the GCD version was around twice as fast across the board, I meant it. Even when there is only a single request, it is still twice as fast as the previous version. This comes largely from the performance benefits of GCDAsyncSocket (kqueues, etc). (See my previous post for more information).
Secondly, although GCD is often touted for it's ability to scale, it still performs excellently when the scaling factor is 1.0. Apple is already using blocks in several foundations classes, and it wouldn't surprise me if we started to see some of the standard delegate/runloop paradigms switch to a delegate/dispatch_queue or block/dispatch_queue paradigm. Furthermore, it's possible that future iOS devices will have multiple cores. Will Apple be pushing GCD even more when this happens? Consider Apple's own wording used to describe the benefits of GCD (slightly adapted from the Mac OS X page to fit what it might say on an iOS page):
GCD-enabled programs can automatically distribute their work across all available cores, resulting in the best possible performance whether they’re running on a dual-core iPhone 5, or a single-core iPod Touch. Once developers start using GCD for their applications, you’ll start noticing significant improvements in performance.
It makes applications more efficient by using only the number of threads required for the work being done. For example, without GCD, if an application needs 20 threads when at maximum capacity, it might set up 20 threads and consume the associated resources even when it has nothing to do. GCD, by contrast, frees resources when it’s not using them, helping to keep the whole system more responsive. Imagine the efficiency and performance gains if every application on your iPhone were using GCD.
Grand Central Dispatch is deeply integrated into iOS, making it easier for all kinds of applications to take better advantage of multicore processors. In addition, your iPhone as a whole becomes more efficient at handling numerous tasks at the same time, resulting in performance gains across the board.
The thing is, Grand Central Dispatch is already available in iOS! Want to be ready for the next hardware upgrade before it's released?
Lastly, the purpose of an embedded HTTP server is to allow developers to do something cool with it inside their application. Perhaps they only want to serve up static files. But more likely they want to generate some dynamic content for their users. If the process of generating that dynamic content takes awhile, then one runs the risk of blocking other requests. Or even the main thread! CocoaHTTPServer solves this problem by including support for asynchronous responses. And I'm happy to report that implementing custom asynchronous responses is now even easier thanks to GCD. Plus, since the entire server is GCD based, there is no longer any possibility of blocking the main thread.
The code can be checked out from the CocoaHTTPServer Google Code Homepage.
12 comments:
thanks for the update - this looks really interesting.
What are the implications in terms of os support though? Would this limit apps to 10.6+ and iOS4, or is there a fallback mechanism?
thanks again for a really useful tool.
Yes, the requirements for v2 are 10.6+ or iOS 4.0+.
The previous version is still available in branches/v1, and it will continue to be maintained/updated for some time.
Thank you for this. Sometimes it is a good thing to fix things that are not broken because it gives you a more powerful framework. Or if you are like me (a somewhat OCD coder), it helps me sleep at night knowing my threads are behaving optimally. ;-) I wish I saw this a couple days ago. Already halfway through my own implementation. I do require an embedded solution that can handle 400+ connections so this is great.
Hi, I have trouble with GCDAsyncSocket. You can see my problem here: http://stackoverflow.com/questions/8525218/ipad-gcdasyncsocket-doesnt-read . I write to my server using your socket, but can't receive answers from it. Pleaaase help!
I would like to use AsyncSockets . I have a client that sends 4 byte length information and then the bytes. How can I implement this on the server side using asyncsockets?
I am very impressed with the server, though i have less than an hour invested so I have a lot to learn
Dear Robbie Hanson:
I am writing to enquire about "CocoaAsyncSocket" you released on "https://github.com/robbiehanson/CocoaAsyncSocket/wiki".
Since "public domain" varies with the jurisdictions, and in our case, the moral right cannot be put in the public domain, could you confirm you will not exercise the copyright along with the moral right when we use the software?
Wouldn't an embed HTTP server violate the app store terms of service, considering how nit picky apple is with their ios store approval process. BTW, this looks great!
Looks like the link to your project is pulled from Google code. Can you point us to any current link for it?
Never mind. I found it on GitHub.
Great docs BTW.
Daamn. This is insanely awesome. In Xcode 4.2 on iOS 5.0.1, the only errors are fixed by adding a , retain to the window and viewController @synthesize in the appDelegate.m.
Your code, docs and whole project is an inspiration to me. Rocking work.
Hi Robbie,
I saw your contributions on Github. You seem to really know your network layer. I'm starting to poc the next phase of my app, and was looking for advice on syncing data from phone to phone. Core Data at the core, and I can manage intermediate levels of NSDctionary, JSON files on local, NSData archives, etc... Matt Gallagher suggested a while back to go for an NSURLCONNECTION.
Anyway, looking to find my way here and not go down a dark tunnel :)...
Thanks Much!
Adam
Post a Comment