Friday, June 12, 2009

Filling the pipelines

I was recently reviewing a friend's code after he asked for some help. He was importing some stuff from an XML file into core data and wanted to know how he could speed it up. After careful review I told him that I didn't know of any way to significantly speed up the XML parsing, and I also didn't know of any way to make the core data part faster. But I did know of a way to speed up the process as a whole - fill the pipelines. Most desktops today have multicore cpus, yada yada yada... I'm sure we've all heard this multithreading propaganda before.

So what was the solution?

3 threads, running concurrently:

Thread 1: Read data from file in chunks and uncompress it
Thread 2: Process chunk of data and prepare it for import
Thread 3: Core Data imports stuff, several hundred records at a time

So it looks something like this:

File IO -> XML Parsing -> Core Data Import

This requires some sort of thread-safe way to pass the data from one thread to another. And of course, if the file IO is way faster than the XML parsing, we don't want it to run out of control and slap 100 MB into RAM. Looks like your classic producer / consumer problem.

There's a lot of great documentation out there on multithreading in Cocoa. Apple's "Threading Programming Guide" is a good place to start. I'm just going to present one simple, yet elegant, solution here.

NSCondition is a really simple way to achieve what we want. The consumer pseudo code looks something like this:

[condition lock];
while([array count] == 0)
{
// Nothing to consume.
// Call wait, which will unlock our condition, and
// block our thread until we're signaled by the
// producer thread, at which point we'll
// automatically regain the lock and continue.
[condition wait];
}

// Remove data from array for processing

[condition unlock];


This concept can be rolled into a really simple class which we called BlockingQueue. There are only a few simple methods. A "put" method that the producer can call. If the array is already at capacity (configurable on init) the method will block. This prevents the producer from creating data faster than the consumer can handle. And a "get" method that the consumer can call. This method blocks if the consumer is going faster than the producer, allowing it to sleep while the producer catches up.


@interface BlockingQueue : NSObject
{
NSUInteger maxSize;
NSCondition *condition;
NSMutableArray *array;
}

- (id)initWithMaxSize:(NSUInteger)maxSize;

- (NSUInteger)maxSize;

- (void)put:(id)obj;

- (id)get;
- (NSArray *)getUpTo:(NSUInteger)num;

@end

@implementation BlockingQueue

- (id)initWithMaxSize:(NSUInteger)max
{
if((self = [super init]))
{
maxSize = MAX(max, 1);

condition = [[NSCondition alloc] init];
array = [[NSMutableArray alloc] initWithCapacity:maxSize];
}
return self;
}

- (void)dealloc
{
[condition release];
[array release];
[super dealloc];
}

- (NSUInteger)maxSize
{
return maxSize;
}

- (void)put:(id)obj
{
if(obj == nil) return;

[condition lock];

while([array count] == maxSize)
{
[condition wait]; // unlock + wait for signal
}

[array addObject:obj];

if([array count] == 1)
{
// The array was previously empty.
// There may be a get operation waiting on us.

[condition signal]; // unlock + signal
}

[condition unlock];
}

- (id)get
{
[condition lock];

while([array count] == 0)
{
[condition wait]; // unlock + wait for signal
}

id result = [[[array objectAtIndex:0] retain] autorelease];
[array removeObjectAtIndex:0];

if([array count] == (maxSize - 1))
{
// The array was previously full.
// There may be a put operation waiting on us.

[condition signal]; // unlock + signal
}

[condition unlock];

return result;
}

- (NSArray *)getUpTo:(NSUInteger)requestNum
{
[condition lock];

while([array count] == 0)
{
[condition wait]; // unlock + wait for signal
}

NSUInteger available = [array count];
NSUInteger resultNum = MIN(available, requestNum);

NSRange resultRange = NSMakeRange(0, resultNum);

NSArray *result = [array subarrayWithRange:resultRange];
[array removeObjectsInRange:resultRange];

if(available == maxSize)
{
// The array was previously full.
// There may be a put operation waiting on us.

[condition signal]; // unlock + signal
}

[condition unlock];

return result;
}

@end


Like I said, pretty simple, but useful when you need it.

 

Tuesday, April 28, 2009

Cocoa HTTPServer Improvements

There's a lot of really cool things you can do with the open-source Cocoa HTTP Server. It's designed to be embedded in your Mac or iPhone application, and allows you to serve up static or dynamic content, as well as accept uploads. And there's a lot of built-in features that give you a lot of power without having to do a lot of work:

  • Built in support for bonjour broadcasting

  • IPv4 and IPv6 support

  • Asynchronous networking using standard Cocoa sockets/streams

  • Password protection with digest access or basic authentication

  • TLS encryption support

  • Extremely FAST and memory efficient

  • Heavily commented code

  • Very easily extensible


You can even configure the server to be multi-threaded. Which leads me to the latest improvement: support for asynchronous responses.

What exactly does this mean? Imagine that you want to dynamically generate the content of a page. Since I prefer concrete examples, lets imagine that this page is going to give all kinds of status information about the machine such as available hard drive space, RAM usage, network usage, etc. But it's going to take a little while to generate all this information, and you don't want to block the connection thread while you're doing it. No problem. Just create an asynchronous HTTPResponse, and generate your data in a background thread! The HTTP server takes care of all the details for you.

We included a sample asynchronous response class in the code: HTTPAsyncFileResponse. This is extra useful for serving up files over a potentially slow storage device, such as network-attached storage. (See this related post: What network-attached storage means for developers)

In addition to this, the server also supports sending dynamic content where you don't exactly know what the content-length will be. The server automatically takes care of the details by using chunked-transfer encoding.

 

What network-attached storage means for developers

Network-attached storage (NAS) is getting more and more popular these days. Its been around for quite some time, but has never been exactly common among your average non-techie computer user. That's changing quickly. How easy is it for someone to plug an external hard-drive into an airport extreme or any other similar router? Not only that, but people can mount FTP, SMB or Amazon S3 drives quite easily. There are cool programs out there that make it a snap to do so.

So what does this mean for developers? Gone are the days where you can assume that reading a small file won't take long. You can no longer assume that accessing the disk from the main thread won't cause the user to see a spinning beachball. If you haven't already, it's time to start thinking about asynchronous file IO.

So how can we go about this in Cocoa?

One option is to use an NSInputStream. The "inputStreamWithFileAtPath:" method can get you setup quickly. And since everything is handled in the runloop, you don't have to mess around with threading.

One disadvantage of using a stream approach is that you can't change the file offset. (Someone please correct me if I'm wrong.) What this means is that you can't simply read, for example, the last couple kilobytes of a file. You'd have to read from the beginning until you got to the point you were interested in. However you can do this is you use a NSFileHandle. In fact, NSFileHandle even comes with several asynchronous read operations.

There is one caveat however: NSFileHandle offers no way to specify how much data it should read in the background. For example, there is a "readInBackgroundAndNotify" method. How much data does it read? The documentation states "the length of the data is limited to the buffer size of the underlying operating system". I was curious to know how much this was, so I tried reading in a couple files and I was surprised by the result. On my system, it was only 510 bytes! Probably fine for reading in a text file, but that's pretty small if you're trying to read something bigger, like say a movie.

I wondered how this might affect performance, so I decided to do some casual benchmarking. I have an external hard-drive (connected via USB) with a 1.38 GB file. First I used NSFileHandle to read the entire file asynchronously using readInBackgroundAndNotify. I performed 5 runs, and got an average of 189.412 seconds.

Next I used NSFileHandle to read the entire file synchronously using readDataOfLength. I had to choose a chunk size (since I had no intention of reading 1.38 GB into memory) so I choose 1 MB. This is big enough for a sizeable chunk of data, but not so big as to dramatically affect the memory footprint of a typical simple application. (Note: I also tested using even bigger chunk sizes up to 10 MB, and the difference was negligible.) Again I performed 5 runs, and this time got an average of 39.868 seconds. That's 475% faster...

Now this was reading from a fast USB connected storage device. Perhaps if we were reading from a slow NAS device it wouldn't matter. But ideally we'd like a solution that gives us good performance on a fast storage device, and doesn't block on a slower storage device. And NSFileHandle's readInBackgroundAndNotify method doesn't exactly fit this bill.

One possible solution would be to stick with NSFileHandle, but read the data in a background thread. We could read a chunk, and then notify our primary thread. When the primary thread is ready for more data, it could start another background operation. But constantly creating and destorying threads could be a bit expensive. Sounds like the perfect situation to use NSOperationQueue.

So I tried one last benchmark. This time I had a method that called readDataOfLength, and I called this method over and over until the file had been read to completion. But I called the method in a background thread with the help of NSOperationQueue and NSInvocationOperation. It was very easy to do, and only took a few extra lines of code. Again I performed 5 runs, and this time got an average of 39.883 seconds. Only 15 milliseconds slower than a synchronous read!

 

Thursday, April 23, 2009

Bug in Apple's NSXML

There is a really annoying bug in Apple's NSXML implementation. I wanted to share this bug because it has recently affected me, as well as a few independent developers who are working with the XMPP Framework. I imagine it would also be useful knowledge to anyone working extensively with XML in Mac OS X.

The bug is in NSXMLElement, in the elementsForName: method.

Consider the following XML fragment:

<a xmlns="ns1">
<b xmlns="ns2"/>
</a>

Calling [a elementsForName:@"b"] results in an empty array!

In order to get the correct result, you'd have to call:
[a elementsForLocalName:@"b" URI:@"ns2"]

However, it will work properly if:
- "a" does not contain an xmlns.
- "b" does not contain an xmlns.
- "a" and "b" contain the same xmlns.

In short, if you know the xmlns of the element that you're trying to get, use the elementsForLocalName:URI: method. (If you're working with the XMPP Framework, there is also the elementForName:xmlns: method.)

Here is the radar.

Note: If you're working with XML on the iPhone, you don't have to worry because KissXML doesn't have this problem.

 

Wednesday, April 22, 2009

Decrypting OpenSSL AES files in C#

Many operating systems are equipped with OpenSSL, making it fairly easy to deal with the otherwise complicated issue of encryption. Windows doesn't come with OpenSSL, but it does come with good ecryption libraries. The trick is getting the two to play nicely.

In this post I'll demonstrate the following:

  • How to encrypt/decrypt a file via AES using OpenSSL on command line

  • How to decrypt in Cocoa

  • How OpenSSL generates the Key and IV from a password

  • How to parse the salt from an OpenSSL encrypted AES file

  • How to use C# to decrypt an OpenSSL encrypted AES file


Let's start by encrypting a file. From the command line OpenSSL is fairly easy to use. (Although it can be daunting sometimes because it can do so many things, and there are so many options.) Here's how we can encrypt the file using AES, and a password of "secret":

$ openssl enc -aes-128-cbc -pass pass:secret -in file.txt -out file.txt.aes

And if we want to decrypt the file, it's almost the same command plus a "-d" (for decrypt) option:

$ openssl enc -aes-128-cbc -d -pass pass:secret -in file.txt.aes

Note: You can install OpenSSL on Windows by downloading the binary from here. This will allow you use OpenSSL from the command line just like in Linux/Mac/etc.

We can decrypt the file programmatically in Mac OS X using SSCrypto. It's child's play:
NSData *fileData = [NSData dataWithContentsOfFile:filePath];

NSString *passwd = @"secret";
NSData *passwdData = [passwd dataUsingEncoding:NSUTF8StringEncoding];

SSCrypto *sscrypto = [[SSCrypto alloc] initWithSymmetricKey:passwordData];
[sscrypto setCipherText:fileData];

NSData *clearData = [sscrypto decrypt:@"aes-128-cbc"];
NSLog(@"clearData: %@", clearData);
NSLog(@"clearText: %@", [sscrypto clearTextAsString]);

[sscrypto release];

Moving on to C#, we don't see an AES class, but there is a Rijndael class we can use since the two are almost the same. But looking at the documentation we find that there's no way to set the password. The RijndaelManaged class wants something called a Key and IV...

So is this an incompatibility? No, not at all. Let's go back to those OpenSSL commands really quick and add a "-p" option:

$ openssl enc -aes-128-cbc -d -p -pass pass:secret -in file.txt.aes
salt=A5307A8D9856664F
key=F7FAA9274A2BAD72554DE543BC2731FD
iv =5D82ECF3D3435CB30106EF21640B19F0


You can also add the "-p" option when encrypting to get the same output.

So it looks like OpenSSL generates a Key and IV from our password. But how? Internally it uses the EVP_BytesToKey method. In a nutshell, this means OpenSSL does the following:

Key = MD5(password + salt)
IV = MD5(Key + password + salt)


That's it! That's all there is to it!

But wait... How do we get the salt?

You can tell OpenSSL whether or not to use a salt by passing the "-salt" or "-nosalt" options explicitly. By default it will use a salt for better security. If a salt is used, the resulting AES file will begin with the word "Salted__" followed by an 8 byte salt. If you hexdump the AES file you'll notice the salt on the end of the first line:

$ hexdump file.txt.aes
0000000 53 61 6c 74 65 64 5f 5f a5 30 7a 8d 98 56 66 4f
0000010 17 3f 0e 03 5d 40 f2 ee e4 2a 60 43 6e 86 90 92
0000020


We're now ready to write some C# code.
// Read in the file and check to see if its salted
byte[] fileData = ReadFile("file.txt.aes");

bool isSalted = false;
byte[] salt = null;

if (fileData.Length > 16)
{
byte[] salted = Encoding.UTF8.GetBytes("Salted__");

if (IsDataEqual(fileData, 0, salted, 0, 8))
{
isSalted = true;

salt = new byte[8];
Buffer.BlockCopy(fileData, 8, salt, 0, 8);
}
}

// Remove salt from file data if necessary
byte[] aesData;

if (isSalted)
{
Console.WriteLine("Salt: {0}", ToHexString(salt));

int aesDataLength = fileData.Length - 16;

aesData = new byte[aesDataLength];
Buffer.BlockCopy(fileData, 16, aesData, 0, aesDataLength);
}
else
{
salt = new byte[0];
aesData = fileData;
}

// Create Key and IV from password
byte[] password = Encoding.UTF8.GetBytes("secret");
Console.WriteLine("password: {0}", ToHexString(password));

MD5 md5 = MD5.Create();

int preKeyLength = password.Length + salt.Length;
byte[] preKey = new byte[preKeyLength];

Buffer.BlockCopy(password, 0, preKey, 0, password.Length);
Buffer.BlockCopy(salt, 0, preKey, password.Length, salt.Length);

byte[] key = md5.ComputeHash(preKey);
Console.WriteLine("key: {0}", ToHexString(key));

int preIVLength = key.Length + preKeyLength;
byte[] preIV = new byte[preIVLength];

Buffer.BlockCopy(key, 0, preIV, 0, key.Length);
Buffer.BlockCopy(preKey, 0, preIV, key.Length, preKey.Length);

byte[] iv = md5.ComputeHash(preIV);
Console.WriteLine("iv: {0}", ToHexString(iv));

md5.Clear();
md5 = null;

// Decrypt using AES
RijndaelManaged rijndael = new RijndaelManaged();
rijndael.Mode = CipherMode.CBC;
rijndael.Padding = PaddingMode.PKCS7;
rijndael.KeySize = 128;
rijndael.BlockSize = 128;
rijndael.Key = key;
rijndael.IV = iv;

ICryptoTransform rijndaelDecryptor = rijndael.CreateDecryptor();

byte[] clearData = rijndaelDecryptor.TransformFinalBlock(aesData, 0, aesData.Length);
Console.WriteLine("clearData: {0}", ToHexString(clearData));

String clearText = Encoding.UTF8.GetString(clearData);
Console.WriteLine("clearText: {0}", clearText);


Download Decrypt AES project

 

Thursday, January 29, 2009

X509 Certificate to NSDictionary

Extracting certificate information from a secure (SSL/TLS) connection is one of those things that should be easy. At least it is in other languages like C#. Unfortunately this is not the case in Cocoa.

Apple provides a rather confusing API. For example, say you wanted to get the subject information from the peer's certificate. (This is the information that says the certificate is for "www.paypal.com", and that the company resides in San Jose, California.) So there's this method:

OSStatus SecCertificateGetSubject (SecCertificateRef certificate, CSSM_X509_NAME *subject);

Hmm.. So you'd need a SecCertificateRef somehow. And it's going to give you a CSSM_X509_NAME. WTF is that? And, of course, it's going to return some goofy status code. Which will invariably take you 30 minutes to track down its meaning when you get an error of -24312 the first time you call the method.

Where's the easy Cocoa method?

In the AsyncSocket project you can find a file called X509Certificate. (It's in the CertTest subfolder.) It's got a few methods that can give you a simple NSDictionary dump of the X509 certificate. Very easy, very cocoa-ish. You can pass it an AsyncSocket, CFReadStream, SecCertificateRef, or SecIdentityRef.

AsyncSocket and SSL/TLS

In the past I've talked about a method of securing communication over your socket using SSL/TLS. We've recently made some improvements that make the process even easier.

(TLS is the successor to SSL, and I'll be using the two terms interchangeably.)

What was wrong with the old method?

Problem #1: Prior to these improvements, one would manually call CFReadStreamSetProperty and CFWriteStreamSetProperty to start TLS. If you're dealing with an easy-to-use Cocoa object such as AsyncSocket, it seems a bit odd to have to drop down to core foundation like this. It's like when you need to trim leading and trailing whitespace from a NSString object, and you have to call CFStringTrimWhitespace. It just looks out of place, and one wonders why there's not an equivalent method in NSString. So perhaps you add a category method, and move on. But in this particular case with AsyncSocket, adding a category just shields you from the bigger problem...

Problem #2: One of the great things about AsyncSocket is queued reads and writes. You don't have to wait for a read to finish before starting another read! All read operations get queued, in the read queue, and are processed in order. Same thing for writes. And reads and writes operate simultaneously. This allows you to focus on your protocol, and frees you to handle communication however you want to, as opposed to waiting around on the socket. So you can write code like this:

[socket writeData:data];
[socket readDataToLength:2 tag:0];
[socket readDataToData:CRLFData tag:1];

The problem with the old method is that calling CFStreamSetProperty takes immediate affect. It doesn't get queued, and occur after previously scheduled operations. In short, it doesn't follow the paradigm like the rest of AsyncSocket. This is a bigger problem with modern protocols. Don't think HTTPS, where TLS must complete before any communication starts. Think of protocols like XMPP and others, where the upgrade to TLS is negotiated later, after communication has already started.

So a new method has been added to AsyncSocket:
- (void)startTLS:(NSDictionary *)settings

And a corresponding delegate method was added:
- (void)onSocket:(AsyncSocket *)sock didSecure:(BOOL)flag

This allows you to easily handle upgrades to TLS like this:

- (void)handleClientStartTLSRequest
{
[sock writeData:serverStartTLSResponse];

// Automatically start TLS after all previously queued
// reads and writes are finished
[sock startTLS:serverTLSSettings];
}
- (void)onSocket:(AsyncSocket *)sock didSecure:(BOOL)flag
{
// Continue after TLS has finished...
}


AsyncSocket works on both Mac and the iPhone.

AsyncSocket Google Code Page