So what was the solution?
3 threads, running concurrently:
Thread 1: Read data from file in chunks and uncompress it
Thread 2: Process chunk of data and prepare it for import
Thread 3: Core Data imports stuff, several hundred records at a time
So it looks something like this:
File IO -> XML Parsing -> Core Data Import
This requires some sort of thread-safe way to pass the data from one thread to another. And of course, if the file IO is way faster than the XML parsing, we don't want it to run out of control and slap 100 MB into RAM. Looks like your classic producer / consumer problem.
There's a lot of great documentation out there on multithreading in Cocoa. Apple's "Threading Programming Guide" is a good place to start. I'm just going to present one simple, yet elegant, solution here.
NSCondition is a really simple way to achieve what we want. The consumer pseudo code looks something like this:
[condition lock];
while([array count] == 0)
{
// Nothing to consume.
// Call wait, which will unlock our condition, and
// block our thread until we're signaled by the
// producer thread, at which point we'll
// automatically regain the lock and continue.
[condition wait];
}
// Remove data from array for processing
[condition unlock];
This concept can be rolled into a really simple class which we called BlockingQueue. There are only a few simple methods. A "put" method that the producer can call. If the array is already at capacity (configurable on init) the method will block. This prevents the producer from creating data faster than the consumer can handle. And a "get" method that the consumer can call. This method blocks if the consumer is going faster than the producer, allowing it to sleep while the producer catches up.
@interface BlockingQueue : NSObject
{
NSUInteger maxSize;
NSCondition *condition;
NSMutableArray *array;
}
- (id)initWithMaxSize:(NSUInteger)maxSize;
- (NSUInteger)maxSize;
- (void)put:(id)obj;
- (id)get;
- (NSArray *)getUpTo:(NSUInteger)num;
@end
@implementation BlockingQueue
- (id)initWithMaxSize:(NSUInteger)max
{
if((self = [super init]))
{
maxSize = MAX(max, 1);
condition = [[NSCondition alloc] init];
array = [[NSMutableArray alloc] initWithCapacity:maxSize];
}
return self;
}
- (void)dealloc
{
[condition release];
[array release];
[super dealloc];
}
- (NSUInteger)maxSize
{
return maxSize;
}
- (void)put:(id)obj
{
if(obj == nil) return;
[condition lock];
while([array count] == maxSize)
{
[condition wait]; // unlock + wait for signal
}
[array addObject:obj];
if([array count] == 1)
{
// The array was previously empty.
// There may be a get operation waiting on us.
[condition signal]; // unlock + signal
}
[condition unlock];
}
- (id)get
{
[condition lock];
while([array count] == 0)
{
[condition wait]; // unlock + wait for signal
}
id result = [[[array objectAtIndex:0] retain] autorelease];
[array removeObjectAtIndex:0];
if([array count] == (maxSize - 1))
{
// The array was previously full.
// There may be a put operation waiting on us.
[condition signal]; // unlock + signal
}
[condition unlock];
return result;
}
- (NSArray *)getUpTo:(NSUInteger)requestNum
{
[condition lock];
while([array count] == 0)
{
[condition wait]; // unlock + wait for signal
}
NSUInteger available = [array count];
NSUInteger resultNum = MIN(available, requestNum);
NSRange resultRange = NSMakeRange(0, resultNum);
NSArray *result = [array subarrayWithRange:resultRange];
[array removeObjectsInRange:resultRange];
if(available == maxSize)
{
// The array was previously full.
// There may be a put operation waiting on us.
[condition signal]; // unlock + signal
}
[condition unlock];
return result;
}
@end
Like I said, pretty simple, but useful when you need it.