Concurrency: Introducing Inter-Thread Communication

Picture the scenario: You’re on the phone with your friend, when suddenly he say ‘Hey! Hold on. I think I’m getting another call’ and before you can say anything, he puts you on hold. Now, you have no idea whether to hang up, or wait. Your friend could be coming back in 3 seconds or 3 hours you’ve no idea. Wouldn’t it be great if your phone service provided you a mechanism that whenever you’re put on hold, you can freely disconnect and whenever your friend disconnects the other call, you immediately get a ring. Let’s talk about managing synchronization between threads.

On a broader level the above is a classical producer-consumer problem. Both consumer and producer are synchronized by a shared data. What we actually need is a way to communicate between the producer and the consumer. The producer needs to signal the consumer the data is ready for consumption.

So far in all our concurrency experiments whenever we’ve some shared data, we explicitly block one or more threads while one thread gets to play with the resource. The shared data here seems to be a boolean flag of some sort. But we can do better with std::condition_variable

    std::mutex mutex;
    std::condition_variable condition;

    void StartConsumer()
    {
        std::cout << std::this_thread::get_id() << " Consumer sleeping" << std::endl;
        std::unique_lock<std::mutex> consumerLock(mutex);
        condition.wait(consumerLock);
        consumerLock.unlock();
        
        std::cout << std::this_thread::get_id() << " Consumer working" << std::endl;
    }
    
    void StartProducer()
    {
        std::cout << std::this_thread::get_id() << " Producer working" << std::endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::lock_guard<std::mutex> producerLock(mutex);
        condition.notify_one();
        std::cout << std::this_thread::get_id() << " Producer sleeping" << std::endl;
    }
    
    void StartProcess()
    {
        std::thread consumerThread(StartConsumer);
        std::thread producerThread(StartProducer);
        
        consumerThread.join();
        producerThread.join();
    }

Output

0x100ced000 Consumer sleeping
0x100d73000 Producer working
0x100d73000 Producer sleeping
0x100ced000 Consumer working

In the above code both the consumer and producer execute simultaneously in concurrent threads. The consumer waits for the producer to finish working on whatever it was doing. But instead of continuously checking for whether the producer is done as in

while (!done) {
	std::unique_lock<std::mutex> consumerLock(mutex);
	consumerLock.unlock();
	std::this_thread::sleep_for(std::chrono::microseconds(1));
}

We make use of condition variable to inform us whenever the producer is done. The producer then using notify_one() (or maybe notify_all()) sends a notification to all the waiting threads to continue.

Moving to GCD, we’ve two ways. First is to use dispatch_group.

let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
let group:dispatch_group_t = dispatch_group_create()

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    println("Consumer working")
}

func StartProducer()
{
    dispatch_group_async(group, concurrentQueue) { () -> Void in
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Output:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

The first thing that you’ll notice in this code is that we start the producer code first and then the consumer waits. This is because if we start the consumer first, the queue has nothing in it to wait for.

dispatch_group is actually meant to synchronize a set a tasks in a queue. For example, you’re loading the next level in your game. For that, you need to download a lot of images and other data files from a remote server before finishing the loading. What you’re doing is synchronizing a lot of async tasks and wait for them to finish before doing something else.

For this experiment, we can better use a dispatch_semaphore. dispatch_semaphore is a functionality based on the classic semaphores concepts you must’ve learned during your college days.

let sema:dispatch_semaphore_t = dispatch_semaphore_create(1);

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    println("Producer working")
    sleep(1)
    println("Producer sleeping")
    dispatch_semaphore_signal(sema)
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

If you look closely, we haven’t even used any dispatch_queue here. This is because dispatch_semaphore is more about synchronization than about threading.

dispatch_semaphore is basically just a count based blocking system. We just create a dispatch_semaphore object by telling it how many counts can be decremented before it blocks. Then on each wait it decrements the count and on each signal it increments the count. If the count reaches less than 0, the thread is blocked until somebody issues a signal.

Here’s another example to illustrate on using semaphores between two queues

let sema:dispatch_semaphore_t = dispatch_semaphore_create(0);
let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    dispatch_async(concurrentQueue) {
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
        dispatch_semaphore_signal(sema)
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Again note that we’re starting producer first, and because we initialize the dispatch_semaphore with 0, the consumer is going to block the main thread as soon as it is invoked. It won’t even give the chance to run the producer code in parallel.

Here’s the output of above code:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

This is just an scratch of the surface of inter-thread communication. Most of problems with concurrency actually somehow deal with inter-thread communication. In later sessions we shall experiment more on this. The idea for this article came from the this discussion on problem of loading game data in a parallel thread and the solution I worked on.

The code for today’s experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , , | Comments Off

Concurrency: Thread-safe Singletons

Love them or hate them, you just can not ignore singletons. Let’s try experimenting with singletons and concurrency.

Singletons are objects that you wish to have only a single instance all over your program lifecycle. Usually, you want to create singleton objects lazily. Here’s a typical way you can create a singleton in a single threaded application.

class Library {
public:

    static Library *SharedInstance()
    {
        if (!shared_) {
            shared_ = new Library();
        }
        return shared_;
    }

    static void Dealloc()
    {
        if (shared_) {
            delete shared_;
        }
    }
    
    Library() :
    books_({
        {"Ernest Hemingway", "Old man and the sea"},
        {"Aldous Huxley", "Brave new world"},
        {"Aldous Huxley", "The Genius and the Goddess"},
        {"Aldous Huxley", "Antic Hay"},
        {"Salman Rushdie", "Midnight's children"},
        {"Fyodor Dostovesky", "Crime and punishment"},
        {"Boris Pasternak", "Doctor Zhivago"},
        {"Bernard Shaw", "Arms and the Man"},
        {"Bernard Shaw", "Man and Superman"}
    })
    {
        std::cout << "Library " << this << std::endl;
    }
    
    ~Library()
    {
        std::cout << "~Library " << this << std::endl;
    }
    
    std::vector<std::string> GetBooks(const std::string &author) const
    {
        typedef std::multimap<std::string, std::string>::const_iterator BookIt;
        std::pair<BookIt, BookIt> range =  books_.equal_range(author);
        std::vector<std::string> bookList;
        for (BookIt it = range.first; it != range.second; ++it) {
            bookList.push_back(it->second);
        }
        return bookList;
    }
    
    std::size_t GetSize() const
    {
        return books_.size();
    }
    
private:
    std::multimap<std::string, std::string> books_;
    static Library *shared_;
};

Library *Library::shared_ = nullptr;

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance()->GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance()->GetSize() << " books" << std::endl;
}

When invoked with:

void SerialImpl()
{
    std::string searchAuthor("Aldous Huxley");
    PrintBooks(searchAuthor);
    PrintSize();
}

void ThreadsafeSingleton_main()
{
    SerialImpl();
    Library::Dealloc();
}

We get

Searching for author: Aldous Huxley
Library 0x100a6bfe0
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
~Library 0x100a6bfe0
main() exit

We see that our singleton actually gets created lazily when actually required and also gets deallocated when the program finally quits, thanks to the Dealloc() function.

Next, lets try calling it from multiple threads.

void ConcurrentImpl()
{
    std::string searchAuthor("Aldous Huxley");
    std::thread bookSearchThread(PrintBooks, std::ref(searchAuthor));
    std::thread libSizeThread(PrintSize);
    
    bookSearchThread.join();
    libSizeThread.join();
}

This might actually work or not depending on how the threads are switching on your machine. To see what is problem with this code, let’s deliberately sleep the thread while creating the singleton instance.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

Now, if you run it you might see two instances of Library objects getting created.

Searching for author: Aldous Huxley
Library Size: Library 0x100b84fe0
Brave new world
The Genius and the Goddess
Antic Hay
Library 0x100ba4fe0
9 books
~Library 0x100ba4fe0
main() exit

There are two problems here. First of all, we’ve two instances of the Library object and second, because our Dealloc routine depends on the guarantee that at any given moment we have a single instance of the Library object, so the other Library object never gets released. This is a classic example of a memory leak.

We can solve this issue with two techniques. First is using the C++11’s static initialization.

Let’s modify our Library class and instead of creating the singleton at heap, we create a static instance of a Library object.

static Library &SharedInstance()
{
    static Library lib;
    return lib;
}

And modifying our calls

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance().GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance().GetSize() << " books" << std::endl;
}

If we now run our code, we get

Searching for author: Aldous Huxley
Library 0x10006d3c8
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
main() exit
~Library 0x10006d3c8

This works because with C++11 we’re now guaranteed to have the static variables created thread-safely. Another benefit with this approach is that we don’t have to implement any Library::Dealloc() function as static data gets deallocated automatically by the runtime. If this satisfies your requirements, this is best way out.

But, this won’t be the best fit for some cases. Say, you want to have Library singleton object only in heap memory. Also, if you’re using Objective-C this won’t work as you’re not allowed to allocate memory of custom objects on the stack. This brings us to our second method: Using C++ thread library.

Let’s switch back to our original implementation, with Library object in heap memory. The first thing that we can try is using mutex.

    static Library *SharedInstance()
    {
        std::lock_guard<std::mutex> lock(mutex_);
        
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

And this works as expected:

Searching for author: Aldous Huxley
Library Size: Library 0x100b92fe0
Brave new world
The Genius and the Goddess
Antic Hay
9 books
~Library 0x100b92fe0
main() exit

The only problem with this code is that Library::SharedInstance() is going to be called from plenty of places, and this code unnecessary lock and unlock the mutex at each call. What we actually need it the mutex only for cases when the single instance has not been created, all other times we can just simple pass the pointer back. This brings us to the infamous double checking lock design.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::lock_guard<std::mutex> lock(mutex_);
            if (!shared_) {
                std::this_thread::sleep_for(std::chrono::seconds(1));
                shared_ = new Library();
            }
        }
        return shared_;
    }

This code looks great right? It applies the mutex only when creating the shared instance, every other time it just returns the pointer. But this has a serious bug. This code would work almost all the time, except when you’ve forgotten about it. It’s so hard to debug that you probably can’t even simulate the issue in the debug environment and unfortunately you won’t ever see this bug again for probably a million clock cycles. This sounds like a programmer’s nightmare right?

Suppose you’re running this code with two thread ‘bookSearchThread’ and ‘libSizeThread’. bookSearchThread comes in and sees the shared_ is NULL. It would acquire the lock and start allocating and initializing the instance. bookSearchThread is half-way done, it has allocated a space in the heap, so the shared_ isn’t NULL anymore, but bookSearchThread is not done initializing the instance yet, just the allocation. Something like in Objective-C as:

Library *shared = [[Library alloc] init];

Suddenly, the thread switching happens and libSizeThread comes in. It sees the shared_ isn’t NULL. It says OK, good somebody must have already created this instance, lets use it straightaway. This could result in undefined behavior, as libSizeThread is now using an object that isn’t actually fully initialized.

This is such a great problem that almost every thread library provides a way to make this work. In C++11, we can use the std::call_once

class Library {
private:
    static std::once_flag onceFlag_;
    /* other data members */
    
    public:
    static Library *SharedInstance()
    {
        std::call_once(onceFlag_, []{
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        });
        return shared_;
    }
    
    /* other member functions */
}

Even GCD provides something similar:

+ (instancetype)sharedInstance
{
    static dispatch_once_t onceFlag;
    static id shared_;
    dispatch_once(&once, ^{
        shared_ = [[[self class] alloc] init];
    });
    return shared_;
}

These helper objects from standard libraries are provided just to guarantee that the object allocation and initialization is complete before setting the flag and every other time, it just returns immediately. So, no more race conditions for us.

So there it is, singletons the thread-safe way.

The code for todays experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , , , | Leave a comment

Concurrency: Working with shared data using threads

In continuation with the last week’s experiment on protecting shared data in a concurrent system, let’s look at how we can make it work using C++ and threads.

Here’s a brief recap: We’re building a Airport scenario, where we have a single status board screen. The status board is simultaneously being edited by multiple airliners. The status board can be read by any passenger at any given moment for any flight status based on the flight number.

Let’s build the concept within a single-threaded environment:

class StatusBoard {
    std::forward_list<int> statusBoard;
    
public:
    void AddFlight()
    {
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        return (std::find(statusBoard.begin(), 
                          statusBoard.end(), 
                          flightNum) != statusBoard.end());
    }
};

StatusBoard statusBoard;

void StatusWriter()
{
    debugCount.write++;
    statusBoard.AddFlight();
}

bool StatusReader(const int searchFlightNum)
{
    debugCount.read++;
    return statusBoard.FindFlight(searchFlightNum);
}

void SerialLoop()
{
    while (!StatusReader(SEARCH_FLIGHT)) {
        StatusWriter();
    }
}

On my machine, it outputs:

Flight found after 26 writes and 27 reads.

Now let’s begin with making this implementation multithreaded.

void StatusWriter()
{
    std::this_thread::sleep_for(std::chrono::seconds(1));
    
    debugCount.write++;
    statusBoard.AddFlight();
}

void ParallelLoop()
{
    while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {
        /* write operation */
        std::thread writeOp(StatusWriter);
        writeOp.detach();
    }
}

The good thing about C++ is that much of the code looks almost similar to the single threaded code and that is due to the way the thread library has been designed. At first glance you won’t be even able to tell where the threading code actually is. Here we’re invoking two operations on two threads at each iteration of the loop.

First we have a read operation

while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {

This makes use of std::future to invoke the StatusReader function on a async thread. The get() call at the end actually start the process. We shall talk about futures and promises in some future experiment. In this experiment it would be suffice to understand that a call to std::async() starts the StatusReader() function in a concurrent thread that magically returns back the result with the get() to the caller thread.

Another is simply spawning of a std::thread using the direct use with detach that we’re already familiar of from our previous experiment.

Here we just start the StatusWriter function on a new thread every time the loop iterates.

/* write operation */
std::thread writeOp(StatusWriter);
writeOp.detach();

On my machine, when the code doesn’t crashes, it prints:

Flight found after 26 writes and 1614 reads.

Obviously, this code isn’t thread-safe. We’re reading and writing to the same forward list from more than one threads, and this code is supposed to crash a lot. So, next step let’s make it thread safe using a mutex.

A mutex is the basic object that is used to provide mutual exclusivity to a portion of code. The mutex in C++ are used to lock a shared resource.

First of all we start with a std::mutex object. This object guarantees that the code is locked for a single thread use at a time. We can use the lock() and unlock() member functions on std::mutex object like:

void foo()
{
    std::mutex mutex;
    
    mutex.lock();
    
    /* 
     * some resource locking code here
     */
    
    mutex.unlock();
}

If you’ve used C++ long enough, you’ll know that this code is prone to all sort of errors. In many cases the foo() can exit before the unlock() gets executed. This could be due to any early returns that you have added to the function. Or, in a more likely scenario, your code does something illegal and the foo() throws some exception. Even if foo() is very robust, it could call some another function which isn’t that robust.

If you’re an old time C++ user, you must know that to solve this problem space we often use RAII idiom. Good for you, the standard thread library already provides this for you in form of a std::lock_guard. The std::lock_guard is designed to work with any lock like object.

Using std::mutex and std::lock_guard our code is now:

class StatusBoard {
    std::forward_list<int> statusBoard;
    mutable std::mutex mutex;
    
public:
    void AddFlight()
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        return (std::find(statusBoard.begin(), statusBoard.end(), flightNum) != statusBoard.end());
    }
};

The mutable keyword is used as we’re using the mutex in a const member function FindFlight(). On my machine it executes as:

Flight found after 10 writes and 1591 reads.

Sometime, the code also throws the following exception after exiting from main:

libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument

This is due to the fact that we’ve explicitly added the thread delay for debugging and some of out async threads are executing as detached and might not be over before the main exits. I don’t think this is going to be a problem in any real code.

So, there, we’ve a basic thread-safe code. This is not the best implementation possible. There are many things to look out for. Like, we’ve to take care of not pass out any references or pointers of shared data, otherwise the code might become unsafe again. Always remember that at the low level all threads share the same address space, and whenever we’ve a shared resource without a lock, we have a race condition. This is more of the design error from the programmer’s perspective.

As always this code is available online at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , | Comments Off

Concurrency: Working with shared data using queues

All problems with concurrency basically boil down to two categories: Race conditions and deadlocks. Today let’s look into one specific problem under race conditions: Working with shared data.

If every thread had to deal with the only data provided exclusively to it, we won’t ever get into any sort of data race problems. In fact, this is one of the main features of functional programming languages like Haskell that advertises heavily about concurrency and parallelism.

But there’s nothing to be worried about, functional programming is more about adopting a design pattern, just as object oriented programming is. Yes, some programming languages offer more towards a particular pattern, but that doesn’t means that we can not adopt that pattern in any other language. We just have to restrict ourselves to a particular convention. For example, there are a plenty or large softwares written with C programming language adopting the object oriented pattern. And for us, both C++ and Swift are very much capable of adopting the functional programming paradigm.

Getting back to the problem of shared data. Let’s look into when is shared data a problem with multiple threads. Suppose we have an application where two or more threads are working on a shared data. If all each threads ever does is a read activity, we won’t have any data race problem. For example, you’re at the airport and checking out your flight status on the giant screen. And, you’re not the only one looking at the status screen. But, does it really matters how many others are reading the information from the same screen? Now suppose the scenario is a little more realistic. Every airliner has a control with which they can update the screen with new information, while a passenger is reading the screen, the situation can get a little difficult. At any given instance, one or more airliner can be updating the screen at same instance, which could result in some garbled text to the passengers reading the screen. The two or airliners are actually in a race condition for a single resource, the screen.

If you’re using Cocoa, you must be familiar with the NSMutableArray. Also, you must be familiar that NSMutableArray isn’t thread-safe as of this writing. So, the question is how do make use of NSMutableArray in a situation like this, where we have concurrent read and write operations going on?

Let’s start with a simple single threaded model of what we’re doing:

/**
 * Concurreny Experiments: Shared data
 * Compile: clang SharedData.m -framework Foundation && ./a.out
 */

#import <Foundation/Foundation.h>

@interface Airport : NSObject {
    NSMutableArray *airportStatusBoard;
    NSUInteger writeCount;
    NSUInteger readCount;
}
@end

@implementation Airport

- (id)init
{
    self = [super init];
    if (!self) {
        return nil;
    }
    
    airportStatusBoard = [[NSMutableArray alloc] init];
    writeCount = 0;
    readCount = 0;
    
    return self;
}

- (void)dealloc
{
    [airportStatusBoard release];
    [super dealloc];
}

- (void)addFlightEntry
{
    NSNumber *newEntry = @(random() % 10);
    [airportStatusBoard addObject:newEntry];
}

- (void) statusWriter
{
    NSLog(@"writing begin: %@", @(writeCount));
    [self addFlightEntry];
    NSLog(@"writing ends: %@", @(writeCount));
    writeCount++;
}

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
    NSLog(@"reading begin: %@", @(readCount));
    
    BOOL found = NO;
    for (NSNumber *flightNum in airportStatusBoard) {
        if ([flightNum isEqualToNumber:searchFlightNum]) {
            found = YES;
            break;
        }
    }
    
    NSLog(@"reading ends: %@", @(readCount));
    
    readCount++;
    
    return found;
}

- (void) serialLoop
{
    srandom(time(0));
    while (![self statusReaderWithFlightNumber:@(7)]) {
        [self statusWriter];
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

@end

int main()
{
    @autoreleasepool {
        Airport *airport = [[Airport alloc] init];
        [airport serialLoop];
    }
    
    return 0;
}

Here’s the output for the above program on my machine:


2014-09-27 14:32:12.636 a.out[9601:507] reading begin: 0
2014-09-27 14:32:12.638 a.out[9601:507] reading ends: 0
2014-09-27 14:32:12.639 a.out[9601:507] writing begin: 0
2014-09-27 14:32:12.639 a.out[9601:507] writing ends: 0
2014-09-27 14:32:12.640 a.out[9601:507] reading begin: 1
2014-09-27 14:32:12.640 a.out[9601:507] reading ends: 1
2014-09-27 14:32:12.641 a.out[9601:507] writing begin: 1
2014-09-27 14:32:12.641 a.out[9601:507] writing ends: 1
2014-09-27 14:32:12.642 a.out[9601:507] reading begin: 2
2014-09-27 14:32:12.642 a.out[9601:507] reading ends: 2
2014-09-27 14:32:12.643 a.out[9601:507] Flight found after 2 writes and 3 reads

As you can observe, we’re performing sequential writes and reads until we find the required flight number in the array, and everything works out great. But, if we now try to run each write and read concurrently. To do that we create a queue for reading and writing. We perform all the reading and writing operation on a concurrent queue. We’re trying to implement the scenario where one passenger (you) is interested in reading the status board from top to bottom and check if their flight is listed. When they reach the end of the list and did not find the flight number they start again from the top. While, many airliner services are trying to update the single status board concurrently.

readWriteQueue = dispatch_queue_create("com.whackylabs.concurrency.readWriteQ", DISPATCH_QUEUE_CONCURRENT);

- (void)parallelLoop
{
    srandom(time(0));

    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(readWriteQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readWriteQueue, ^{
            if ([self statusReaderWithFlightNumber:@(SEARCH_FLIGHT)]) {
                done = YES;
            }
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

If you now try to execute this program, once in a while you might get the following error:


Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <__NSArrayM: 0x7f8f88c0a5d0> was mutated while being enumerated.'

This is the tricky part about debugging race conditions, most of the time your application might to be working perfectly, unless you deploy it on some other hardware or you get unlucky, you won’t get the issues. One thing you can play around with is, deliberately sleeping a thread to cause race conditions with sprinkling thread sleep code around for debugging:

- (void) statusWriter
{
    [NSThread sleepForTimeInterval:0.1];
    writeCount++;
    NSLog(@"writing begin: %@", @(writeCount));        
    [self addFlightEntry];
    NSLog(@"writing ends: %@", @(writeCount));
}

If the above code doesn’t crashes on your machine on the first go, edit the sleep interval and/or give it a few more runs.

Now coming to the solution of such a race condition. The solution to such problem is in two parts. First is the write part. When multiple threads are trying to update the shared data, we need a way to lock the part where the actual update happens. It’s something like when multiple airliners are trying to update the single screen, we can figure out a situation like, only one airliner has the controls required to update the screen, and when it’s done, it releases the control to be acquired by any other airliner in queue.

Second, is the read part. We need to make sure that when we’re performing a read operation some other thread isn’t mutating the shared data as NSEnumerator isn’t thread-safe either.

A way to acquire such a lock with GCD is to use dispatch_barrier()

- (void) statusWriter
{
    dispatch_barrier_async(readWriteQueue, ^{
        
        [NSThread sleepForTimeInterval:0.1];
        
        NSLog(@"writing begin: %@", @(writeCount));
        
        [self addFlightEntry];
        
        NSLog(@"writing ends: %@", @(writeCount));
        writeCount++;
        
    });
}

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
   __block BOOL found = NO;
    
    dispatch_barrier_async(readWriteQueue, ^{
        
        NSLog(@"reading begin: %@", @(readCount));
        
        for (NSNumber *flightNum in airportStatusBoard) {
            if ([flightNum isEqualToNumber:searchFlightNum]) {
                found = YES;
                break;
            }
        }
        
        NSLog(@"reading ends: %@", @(readCount));
        
        readCount++;
        
    });
    
    return found;
}

If you try to run this code now, you’ll observe that the app doesn’t crash anymore, but the app has become significantly slower.

The first performance improvement that we can make is to not return from the read code immediately. Instead, we can let the queue handle the completion:

- (void) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum completionHandler:(void(^)(BOOL success))completion
{
    
    dispatch_barrier_async(readWriteQueue, ^{
        
        NSLog(@"reading begin: %@", @(readCount));
        
        BOOL found = NO;
        
        for (NSNumber *flightNum in airportStatusBoard) {
            if ([flightNum isEqualToNumber:searchFlightNum]) {
                found = YES;
                break;
            }
        }
        
        NSLog(@"reading ends: %@", @(readCount));
        
        readCount++;
       
        completion(found);
    });
}

- (void)parallelLoop
{
    srandom(time(0));
    
    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(readWriteQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readWriteQueue, ^{
            [self statusReaderWithFlightNumber:@(SEARCH_FLIGHT) completionHandler:^(BOOL success) {
                done = success;
            }];
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

Another thing to note is that, our code is no longer executing in parallel. As each dispatch_barrier() blocks the queue until all the pending tasks in the queue are complete. As is evident from the final log.

Flight found after 154 writes and 154 reads

In fact, the processing could be even worse than just running it sequentially, as the queue has to take care of the locking and unlocking overhead.

We can make the read operation as non-blocking again, as blocking the reads is not getting us any profit. One way to achieve that is to copy the latest data within a barrier and then use the copy to read the data while the other threads update the data.

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
    
    NSLog(@"reading begin: %@", @(readCount));
    
    __block NSArray *airportStatusBoardCopy = nil;
    dispatch_barrier_async(readWriteQueue, ^{
        
        airportStatusBoardCopy = [airportStatusBoard copy];
        
    });
    
    
    BOOL found = NO;
    
    for (NSNumber *flightNum in airportStatusBoardCopy) {
        if ([flightNum isEqualToNumber:searchFlightNum]) {
            found = YES;
            break;
        }
    }

    if (airportStatusBoardCopy) {
        [airportStatusBoardCopy release];
    }
    
    NSLog(@"reading ends: %@", @(readCount));
    
    readCount++;
    
    return found;
}

But, this won’t solve the problem. Most of reads are probably a waste of time. What we actually need is a way to communicate between the write and read. The actual read should only happen after the data has been modified with some new write.

We can start by separating the read and write tasks. We actually need the read operations to be serial, as we are implementing the case where a passenger is reading the list top-down and when they reach at the end, they start reading from the beginning. Whereas, many airliners can simultaneously edit the screen, so they still need to be in a concurrent queue.

readQueue = dispatch_queue_create("com.whackylabs.concurrency.readQ", DISPATCH_QUEUE_SERIAL);
writeQueue = dispatch_queue_create("com.whackylabs.concurrency.writeQ", DISPATCH_QUEUE_CONCURRENT);

- (void)parallelLoop
{
    srandom(time(0));
    
    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(writeQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readQueue, ^{
            [self statusReaderWithFlightNumber:@(SEARCH_FLIGHT) completionHandler:^(BOOL success) {
                done = success;
            }];
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

Now, in the read operation we take a copy of the shared data. The way we can do it is by having an atomic property. Atomicity guarantees that the data is either have be successful or unsuccessful, there won’t be any in-betweens. So, in all cases we shall have a valid data all the time.

@property (atomic, copy) NSArray *airportStatusBoardCopy;

In each read operation, we simply copy the shared data, so don’t care about mutability anymore.

self.airportStatusBoardCopy = airportStatusBoard;

And further on we can use the copy to execute the read operations.

2014-09-27 16:20:33.145 a.out[12855:507] Flight found after 11 writes and 457571 reads

This is the queue level solution of the problem. In our next update we will take a look at how we can solve the problem at thread level using C++ standard thread library.
As always, the code for today’s experiment is available online at the github.com/chunkyguy/ConcurrencyExperiments.

Posted in concurrency, _dev | Tagged , , , | Comments Off

Component System using C++ Multiple Inheritance

Few days back, I talked about designing Component System using Objective-C and message forwarding. Today, to carry the conversation forward, we shall see how can the same Component System be designed with C++, and also why C++ is a better language choice for such a design.

Before I begin, let’s copy-paste the abstract from our earlier post, on why and when do we actually need a Component System:

I wanted a component system where I’ve a GameObject that can have one or more components plugged in. To illustrate the main problems with the traditional Actor based model lets assume we’re developing a platformer game. We have these three Actor types:

1. Ninja: A ninja is an actor that a user can control with external inputs.
2. Background: Background is something like a static image.
3. HiddenBonus: Hidden bonus is a invisible actor that a ninja can hit.

Source: Swim Ninja

Source: Swim Ninja

Let’s take a look at all the components inside these actors.

Ninja = RenderComponent + PhysicsComponent
Background = RenderComponent
HiddenBonus = PhysicsComponent.

Here RenderComponent is responsible for just drawing some content on the screen.
while the PhysicsComponent is responsible for just doing physics updates like, collision detection.

So, in a nutshell from our game loop we need to call things like

void Scene::Update(const int dt)
{
    ninja.Update(dt);
    hiddenBonus.Update(dt);
}
 
void Scene::Render()
{
    background.Render();
    ninja.Render();
}

Now, in the traditional Actor model, if we have an Actor class with both RenderComponent and PhysicsComponent like:

	
class Actor {
public:
    void Update(const int dt);
    void Render();
};

    Actor ninja;
    Actor hiddenBonus;
    Actor background;

Then it would inevitably add a PhysicsComponent to background actor and a RenderComponent to a hiddenBonus actor. So, the user of the code has to keep a mental track of what functions shouldn’t be invoked on what objects. Which is as awful in reality, as it sounds in theory.

We could use the same approach we used when designing the Component System with Objective-C, by having a function on fat GameObject that could be used to enable a particular component. Think of each GameObject as an collection of many components. Where each component know how to deal with one thing, like rendering, physics, AI, networking, and so on. When creating a new GameObject we need a method to specify what components should be enabled for it. The fat interface technique we’ve already discussed in that last experiment. Let’s try something new, we’re using C++ after all. Let’s start with breaking the components as separate classes:

class RenderComponent {
public:
    void Render();
};

class PhysicsComponent {
public:
    void Update(const int dt);
};

Then we can easily create composite components using such core components, such as:

class RenderAndPhysicsComponent : public RenderComponent, public PhysicsComponent
{};

Now, we can update our game scene’s interface as:

    RenderAndPhysicsComponent ninja;
    PhysicsComponent hiddenBonus;
    RenderComponent background;

This definitely serves our immediate needs, but there is some problem. This solution doesn’t scales well. Whenever we come up with a new component, we’ve to update your related base class accordingly and create a mix for each possible basic component interfaces. For five core components, we would get a total of C(5,1) + C(5,2) + C(5,3) + C(5,4) + c(5,5) = 5 + 10 + 10 + 5 + 1 = 31 total component interfaces combinations possible.

In C++, we can actually do better. This is because C++ has one thing that other languages just don’t want. It’s called multiple inheritance.

Instead of making the a composite collection class a direct collection of many components, we can instead have a single GameObject class that could inherit from as many components as desired. And, further to avoid having the fat abstraction issue, we can have the GameObject to be inherited at compile time from only the components that we actually need for that particular object.

How do we design such a GameObject? One correct answer is using variadic templates:

template <typename... Component>
class GameObject : public Component... {
public:
    GameObject() :
    Component()...
    {}
    
    GameObject(Component... component) :
    Component(component)...
    {}
};

This GameObject has nothing of its own. A host class that wishes to use a instance of this object has to supply a concrete class as a template parameter. This has given infinite flexibility to our GameObject.

We can now update our interface to easily use the core components to construct our GameObject instances that fit our needs:

    GameObject<RenderComponent, PhysicsComponent> ninja;
    GameObject<PhysicsComponent> hiddenBonus;
    GameObject<RenderComponent> background;

If that seems to verbose to write every time, we can in fact use typedefs to create some most common types.

    typedef GameObject<RenderComponent> Sprite;
    typedef GameObject<PhysicsComponent> Trigger;
    typedef GameObject<RenderComponent, PhysicsComponent> Actor;
    
    Actor ninja;
    Trigger hiddenBonus;
    Sprite background;

Now, I know that feels already too awesome, but we still have a problem. In practice we often need to communicate between components. Like say, we have a TransformComponent, that just provides us a model matrix based on the current transform of the GameObject.

class TransformComponent {
public:
    /** Get the model matrix of the transform to take things from model space to
     * the world space
     */
    GLKMatrix4 GetModelMatrix() const;
    
    GLKVector2 position;
    float rotation;
};

And, that model matrix then could be used by the RenderComponent for rendering the mesh.

class RenderComponent {
public:

    void Render(const Uniform &uniform,
                const Camera &camera) const;

    GLKMatrix4 modelMatrix;
    Mesh mesh;
};

void RenderComponent::Render(const Uniform &uniform,
                             const Camera &camera) const
{
    GLKMatrix4 modelViewProjectionMatrix = GLKMatrix4Multiply(camera.GetProjectionMatrix(),
                                                              GLKMatrix4Multiply(camera.GetViewMatrix(),
                                                                                 modelMatrix));
    glUniformMatrix4fv(uniform.um4k_Modelviewprojection, 1, 0, modelViewProjectionMatrix.m);
    mesh.Draw();
}

So, how do go with that design?

To solve that, we actually need another component. Let’s call it CustomComponent. The entire purpose of CustomComponent is to have an Update function:

class CustomComponent {
public:
    void Update(const int dt);
    
protected:
    std::function<void(const int dt)> customUpdate;
};

Now we can use our CustomComponent to bind many other components with something like:

void Ninja::Load()
{
	/* ... */
    customUpdate_ = std::bind(&Ninja::Update, this, std::placeholders::_1);
}
void Ninja::Update(const int dt)
{
	/* from transform component */
	GLKMatrix4 modelMatrix = GetModelMatrix();
	/* to render component */
    SetModelMatrix(modelMatrix);
}

No matter how absurd this code looks, it works. We can improve this, and while doing that we shall also address another important issue with multiple inheritance, name clashing.

Since, our GameObject is now split into multiple files we can easily get name clashing. For example both RenderComponent and TransformComponent can have a member named

GLKMatrix4 modelMatrix;

Or, both PhysicsComponent and CustomComponent can have a member function named:

void Update(const int dt);

How do we resolve this issue? One way is to adopt a naming convention, like say every PhysicsComponent will have a ‘phy’ prefix appended before every member function and data member. That would reduce our code to:

void Ninja::Update(const int dt)
{
	/* from transform component */
	GLKMatrix4 modelMatrix = transGetModelMatrix();
	/* to render component */
    rendrSetModelMatrix(modelMatrix);
}

We all agree that following a naming convention is very hard. So this solution might not work in long term. A better way out is to encapsulate all the core logic out of the component to another core class. The TransformComponent could be reduced to:

class TransformComponent {
public:
    /** Create default Transform component */
    TransformComponent();
    
protected:
    Transform transform_;
};

class Transform {
public:
    /** Create default Transform */
    Transform(const GLKVector2 &position = GLKVector2Make(0.0f, 0.0f),
              const Degree rotation = 0.0f);

    GLKVector2 GetPosition() const;
    
    void SetPosition(const GLKVector2 &position);
    
    /** Get angle in degrees */
    float GetRotation() const;
    
    /** Set rotation in degrees
     * @param rotation Angle in degrees.
     */
    void SetRotation(const Degree rotation);
    
    /** Get the model matrix of the transform to take things from model space to
     * the world space
     */
    GLKMatrix4 GetModelMatrix() const;
    
private:
    GLKVector2 position_;
    Degree rotation_;
};

With such pattern implemented, our same code would look like:

void Ninja::Update(const int dt)
{
    renderer_.SetModelMatrix(transform_.GetModelMatrix());
}

OK, with that taken care of, let’s now focus on the fitting the pieces together. Let’s begin with updating the commonly used GameObject configurations to proper C++ types:

class Actor : public GameObject <TransformComponent, PhysicsComponent, RenderComponent, CustomComponent> {
public:
    void Load(const Transform &trans, const Mesh &mesh);
    void Update(const int dt);
};

class Sprite : public GameObject<TransformComponent, RenderComponent, CustomComponent> {
public:
    void Load(const Transform &trans, const Mesh &mesh);
    void Update(const int dt);
};

class Trigger : public GameObject<TransformComponent, PhysicsComponent, CustomComponent> {
public:
    void Load(const Transform &trans);
    void Update(const int dt);
};

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    ninja_.Load(Transform(), meshMap_[1]);
    hiddenBonus_.Load(Transform());
}

Lets now implement the loop. The trickiest part of the loop is that we need to call Update() on every object that has a CustomComponent and Render() on objects that implement RenderComponent. Notice, so far we haven’t implemented any virtual functions, so can not depend on the runtime to select the correct function. What we actually need is something like:

void Scene::Update(const int dt)
{
    /* update all custom components */
}

void Scene::Render()
{
    glClear(GL_COLOR_BUFFER_BIT);
    
    /* bind all pieces to the GL pipeline */
    meshMem_.Enable();
    shader_.Enable();
    
    /* render all render components */
}

What we actually need some sort of container class that hold all types of Components. The first thought that hits the brain is to have a std::vector of Component of every kind. And from our experience we already know this sort of patterns doesn’t scales well. But using the power of C++11 and tuples we implement that in a much better way. Let’s add a ContainerComponent as:

template <typename... Component>
class ContainerComponent {
protected:
    std::tuple<std::vector<Component*>...> children_;
};

As you can see, ContainerComponent is capable of holding any number of components of any size. Think of it as a class that can possess compile time flexibility to hold any kind of components and at runtime we can add and remove instances of those component classes.

Now we can update our plain Scene class to be a GameObject that has a ContainerComponent. The ContainerComponent is capable of holding two kind of Components, the CustomComponent and the RenderComponent:

class Scene : public GameObject <
	ContainerComponent<
		CustomComponent,
		RenderComponent
>> {
	/* ... */
};

The only important thing to remember is that the order matters. Because this is a tuple we have to remember the index of Component type.

Our new updated Load() now looks like:

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    std::get<0>(children_).push_back(&background_);
    std::get<1>(children_).push_back(&background_);
    
    ninja_.Load(Transform(), meshMap_[1]);
    std::get<0>(children_).push_back(&ninja_);
    std::get<1>(children_).push_back(&ninja_);

    hiddenBonus_.Load(Transform());
    std::get<0>(children_).push_back(&hiddenBonus_);
}

The power of C++ compile time checks comes into play when we do something illegal, such as:

std::get<1>(children_).push_back(&hiddenBonus_);

The above expression will throw an error at compile time because the hiddenBonus_ does not has a RenderComponent and we’re trying to add it to a vector of RenderComponents.

So far so good. Let’s make the code more readable by removing the hard-coded integers. If we introduce a enum type as:

enum ContainerIndex {
    Custom = 0,
    Render = 1
};

We can do things like:

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    std::get<ContainerIndex::Custom>(children_).push_back(&background_);
    std::get<ContainerIndex::Render>(children_).push_back(&background_);
    
    ninja_.Load(Transform(), meshMap_[1]);
    std::get<ContainerIndex::Custom>(children_).push_back(&ninja_);
    std::get<ContainerIndex::Render>(children_).push_back(&ninja_);

    hiddenBonus_.Load(Transform());
    std::get<ContainerIndex::Custom>(children_).push_back(&hiddenBonus_);
}

Now getting back to our loop. We can simply do things like:

void Scene::Update(const int dt)
{
    /* update all custom components */
    for (auto customComponent : std::get<ContainerIndex::Custom>(children_)) {
        customComponent->Update(dt);
    }
}

void Scene::Render()
{
    glClear(GL_COLOR_BUFFER_BIT);
    
    /* bind all pieces to the GL pipeline */
    meshMem_.Enable();
    shader_.Enable();
    
    /* render all render components */
    for (const auto renderComponent : std::get<ContainerIndex::Render>(children_)) {
        renderComponent->Render(uniform_, camera_);
    }
}

Here is the result of this prototype:

The yellow color is applied to the render-only background, while the cyan is the ‘render+physics’ ninja.

I haven’t implemented any physics elements, but in a real game with a Physics engine implemented the PhysicsComponent should look something like:

class PhysicsComponent {
public:
    PhysicsComponent();
    
protected:
    b2Body *physicsBody_;
};

class Physics {
public:
    bool Load(
              const GLKVector2 &screenSize,
              const GLKVector2 &gravity
              );
    
    b2World *GetWorld() const;
    
private:
    std::unique_ptr<b2World> physicsWorld_;    
};

And any GameObject that listens to physics updates should look something like:

void Ball::Update(const int dt)
{
    transform_.SetPosition(PhysicsConverter::GetPixels2(physicsBody_->GetPosition()));
    renderer_.SetModelMatrix(transform_.GetModelMatrix());
}

And then the typical Scene class’s Update() should look like:


bool GameScene::Load(const GLKVector2 &screen)
{
    /* enable physics */
    physics_.Load(screen, GLKVector2Make(0.0f, 0.0f));

	/* more loading ... */
}

void GameScene::Update(const int dt)
{
    /* update physics */
    physics_.GetWorld()->Step(1.0f/FPS, 10, 10);
    physics_.GetWorld()->ClearForces();
    
    /* update all custom components */
    for (auto customComponent : std::get<ContainerIndex::Custom>(children_)) {
        customComponent->Update(dt);
    }
    
    /* update camera */
    camera_.SetPan(ball_.GetTransform().GetPosition());
}

I hope this gives and idea of how can we use the enormous power of C++ to build an game engine based on component system. This also shows why C++ is such a great language for such a pattern when compared to something like Objective-C.

The entire code for this experiment is available at https://github.com/chunkyguy/CppComponentSystem. Please feel free to checkout and start playing around.

Posted in OpenGL, _dev, _games | Tagged , | Comments Off

Concurrency: Spawning Independent Tasks

First step in getting with concurrency is spawning tasks in background thread.

Lets say we’re building a chat application where a number of users chat in a common room. In this sort of scenario every user is probably chatting at the same instance. For sake of simplicity let’s say each user can only have a single character username. And obviously such a constraint is definitely going to outrage the users. So, pissed off as they are, they’re just printing their usernames over and over again.

To support such a functionality, we might need a function like such:

void PrintUsername(const char uname, const int count)
{
    for (int i = 0; i < count; ++i) {
        std::cout << uname << ' ';
    }
    std::cout << std::endl;
}

In a single threaded environment, whatever user invokes this method gets the control of the CPU and their username is going to get printed until the loop expires.

If there are two users in the chat room with usernames ‘X’ and ‘Y’, each wants to print their username 10 times. The calling code might look something like:


void SingleThreadRoom()
{
    PrintUsername('X', 10);
    PrintUsername('Y', 10);
}

The output is as expected:

X X X X X X X X X X 
Y Y Y Y Y Y Y Y Y Y 

Now, to make the app slightly better, let’s give both the users equal weights. Each user should be able to perform the same task in their own thread. Simplest way to spawn a new thread is by calling join()


void MultiThreadRoom()
{
    std::thread threadX(PrintUsername, 'X', 10);
    std::thread threadY(PrintUsername, 'Y', 10);
    
    threadX.join();
    threadY.join();
}

On my machine this outputs as:

YX  YX  YX  YX  YX  YX  YX  YX  YX  YX  

Calling join() on a thread attaches the thread to the calling thread. When the app launches we already get the main thread, and since we’re calling join() on the main thread, the two spawned threads get joined to the main thread.

If we need a thread for some task that is of kind fire-and-forget, we can call detach(). But, if we call detach() we’ve no control of the thread lifetime anymore. It’s something like the main thread doesn’t cares anymore.

For a program like:

void MultiThreadRoomDontCare()
{
    std::thread threadX(PrintUsername, 'X', 10);
    std::thread threadY(PrintUsername, 'Y', 10);
    
    threadX.detach();
    threadY.detach();
}

int main()
{
    MultiThreadRoomDontCare();
    std::cout << "MainThread: Goodbye" << std::endl;
    return 0;
}

The output on my machine is:

MainThread: Goodbye
XY  XY

As you can see the main thread doesn’t care if it’s child thread are still not finished.

Getting over to Swift, we don’t have to manually track the threads, but think in more higher terms as tasks and queues.

First problem we face when trying out async code in Playground is that the NSRunLoop isn’t running in Playground as it is with out native apps. The solution is to import the XCPlayground and call XCPSetExecutionShouldContinueIndefinitely() method. All thanks to the internet.

Here’s a test code that works:

import UIKit
import XCPlayground

XCPSetExecutionShouldContinueIndefinitely(continueIndefinitely: true)

NSOperationQueue.mainQueue().addOperation(NSBlockOperation(block: { () -> Void in
    println("Hello World!!")
}))

var done = "Done"

Since, we’re talking in terms of queues and tasks, we have to change the way of thinking. Now, we don’t care about threads anymore, only tasks. The task is to print the username. This task is going to be submitted to the queue and the queue internally shall manage the threads.

A code like:

func printUsername(uname:String)
{
        println(uname)
}

let chatQueue:NSOperationQueue = NSOperationQueue()

func multiThreadRoom()
{
    for (var i = 0; i < 5; ++i) {
        chatQueue.addOperation(NSBlockOperation(block: { () -> Void in
            printUsername("X")
        }))
        
        chatQueue.addOperation(NSBlockOperation(block: { () -> Void in
            printUsername("Y")
        }))
    }
}

multiThreadRoom()

println("Main Thread Quit")

Prints on my machine as:

X
Y
X
Y
X
Y
X
Y
X
Y
Main Thread Quit

This is the very basics of spawning concurrent independent tasks. If all your requirements are working on independent set of tasks, this is almost all you need to learn about concurrency to get the job done. I hope the real world were this simple ;)

As always, the entire code is also available online at: https://github.com/chunkyguy/ConcurrencyExperiments

Posted in concurrency, _dev | Tagged , , , | Comments Off

Concurrency: Getting started

The time everybody was speculating for so long has finally arrived. The processors aren’t getting any faster. The processors are hitting the physical thresholds. If we try to run the processors, as they are with current technology, they’re getting over-heated. Moore’s law is breaking down. Quoting Herb Sutter, ‘The free lunch is over‘.

If you want your code to run faster on new generation of hardware, you have to consider concurrency. All modern computing devices have multi-cores in built. More and more languages, toolkits, libraries, frameworks are providing support of concurrency. Let’s talk about concurrency.

Starting from today, I’ll be experimenting with multithreading. I’ll be using the following technologies. First is the C++11 thread library that ships with the C++11 standard. Second is the libdispatch or Grand Central Dispatch (GCD) that is developed by Apple. Although, GCD isn’t a pure threading library. It’s more of a concurrency library that is built over threads. In GCD we normally think in terms of tasks and queues. It’s a level above threads. And, the third is NSOperation. NSOperation is another level up from GCD. NSOperation is pure OOP oriented concurrency API. I’ve used it quite often with Objective-C and I’m sure it works as smoothly with Swift.

Let’s start with our first experiment. Dividing a huge task into smaller chunks and let them run in parallel to each other.

So, first question, what task should we pick. Let’s pick sorting of a dataset. Lets say, we’ve an huge array and we want to sort it. We could do it using the famous Quicksort algorithm as:

    template<typename T>
    void QuickSort(std::vector<T> &arr, const size_t left, const size_t right)
    {
        if (left >= right) {
            return;
        }
        
        /* move pivot to left */
        std::swap(arr[left], arr[(left+right)/2]);
        size_t last = left;
        
        /* swap all lesser elements */
        for (size_t i = left+1; i <= right; ++i) {
            if (arr[i] < arr[left]) {
                std::swap(arr[++last], arr[i]);
            }
        }
        
        /* move pivot back */
        std::swap(arr[left], arr[last]);

        /* sort subarrays */
        QuickSort(arr, left, last);
        QuickSort(arr, last+1, right);
    }

The algorithm is pretty straightforward. Given a list of unsorted data, in each iteration we pick a pivot element and sort that element by moving all smaller elements on the left side and all larger elements on right. By the end of the iteration, our pivot element is in the sorted position with we have two unsorted subarrays.

This algorithm is a prefect pick for concurrency. As in each iteration we get our problem divided into two independent sub problems that we can easily execute in parallel.

Here’s my first approach:

    template<typename T>
    void QuickerSort(std::vector<T> &arr, const size_t left, const size_t right)
    {
        if (left >= right) {
            return;
        }
        
        /* move pivot to left */
        std::swap(arr[left], arr[(left+right)/2]);
        size_t last = left;
        
        /* swap all lesser elements */
        for (size_t i = left+1; i <= right; ++i) {
            if (arr[i] < arr[left]) {
                std::swap(arr[++last], arr[i]);
            }
        }
        
        /* move pivot back */
        std::swap(arr[left], arr[last]);
        
        /* sort subarrays in parallel */
        auto taskLeft = std::async(QuickerSort<T>, std::ref(arr), left, last);
        auto taskRight = std::async(QuickerSort<T>, std::ref(arr), last+1, right);
    }

And here’s my calling code:

template <typename T>
void UseQuickSort(std::vector<T> arr)
{
    clock_t start, end;
    start = clock();
    QuickSort(arr, 0, arr.size() - 1);
    end = clock();
    std::cout << "QuickSort: " << (end - start) * 1000.0/double(CLOCKS_PER_SEC) << "ms" << std::endl;
    assert(IsSorted(arr));
}

template <typename T>
void UseQuickerSort(std::vector<T> arr)
{
    clock_t start, end;
    start = clock();
    auto bgTask = std::async(QuickerSort<T>, std::ref(arr), 0, arr.size()-1);
    if (bgTask.wait_for(std::chrono::seconds(0)) != std::future_status::deferred) {
        while (bgTask.wait_for(std::chrono::seconds(0)) != std::future_status::ready) {
            std::this_thread::yield();
        }
        assert(IsSorted(arr));
        end = clock();
        std::cout << "QuickerSort: " << (end - start) * 1000.0/double(CLOCKS_PER_SEC) << "ms" << std::endl;
    }
}

void main()
{
    
    std::vector<int> arr;
    for (int i = 0; i < 100; ++i) {
        arr.push_back(rand()%10000);
    }

    UseQuickSort(arr);
    UseQuickerSort(arr);
}

I’ve added some code to profile how long does each of these tasks take to execute the data set. The dataset of 100 elements executes on my machine as:

QuickSort: 0.023ms
QuickerSort: 29.12ms

As you can see, the concurrent code takes a lot longer to execute than the single threaded one. One guess could be that the creating thread for each subproblem and them the extra load of context switching between the threads could be the reason behind the latency. Or you could also suggest that there are huge flaws in my implementation.

I accept, there could be huge flaws in my implementation, as this is my first take at core multi-threading with C++ thread library. But, still we can’t ignore the fact that we’re spawning a lot of threads with each iteration. The number of threads running concurrent at each iteration are getting added by a factor of 2. This implementation definitely needs to be improved.

This is where GCD shines. First of all, it’s easier to begin with. Keep in mind I’ve equal amount of experience with both the libraries. So, my code should be full of bugs and most probably I’m not using them in the best possible way. But, this is all part of the test and my experiments.

Anyhow, when I implement the same problem using Swift and GCD, like so:


import UIKit

// MARK: Common

func isSorted(array:[Int]) -> Bool
{
    for (var i = 1; i < array.count; ++i) {
        if (array[i] < array[i-1]) {
            println("Fail case: \(array[i]) > \(array[i-1])")
            return false
        }
    }
    return true
}

//MARK: Quick Sort

func quickSort(inout array:[Int], left:Int, right:Int)
{
    if (left >= right) {
        return;
    }
    
    /* swap pivot to front */
    swap(&array[left], &array[(left+right)/2])
    var last:Int = left;
    
    /* swap lesser to front */
    for (var i = left+1; i <= right; ++i) {
        if (array[i] < array[left]) {
            swap(&array[i], &array[++last])
        }
    }
    
    /* swap pivot back */
    swap(&array[left], &array[last])
    
    /* sort subarrays */
    quickSort(&array, left, last)
    quickSort(&array, last+1, right)
}

func useQuickSort(var array:[Int])
{
    let start:clock_t = clock();
    quickSort(&array, 0, array.count-1)
    let end:clock_t = clock();
    
    println("quickSort: \((end-start)*1000/clock_t(CLOCKS_PER_SEC)) ms")
    assert(isSorted(array), "quickSort: Array not sorted");
}

//MARK: Quicker Sort

let sortQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)


func quickerSort(inout array:[Int], left:Int, right:Int)
{
    if (left >= right) {
        return;
    }
    
    /* swap pivot to front */
    swap(&array[left], &array[(left+right)/2])
    var last:Int = left;
    
    /* swap lesser to front */
    for (var i = left+1; i <= right; ++i) {
        if (array[i] < array[left]) {
            swap(&array[i], &array[++last])
        }
    }
    
    /* swap pivot back */
    swap(&array[left], &array[last])
    
    /* sort subarrays */
    dispatch_sync(sortQueue, { () -> Void in
        quickSort(&array, left, last)
    })
    dispatch_sync(sortQueue, { () -> Void in
        quickSort(&array, last+1, right)
    })
}

func useQuickerSort(var array:[Int])
{

    let start:clock_t = clock();

    dispatch_sync(sortQueue, { () -> Void in
        quickerSort(&array, 0, array.count-1)

        let end:clock_t = clock();
        println("quickerSort: \((end-start)*1000/clock_t(CLOCKS_PER_SEC)) ms")
        
        assert(isSorted(array), "quickerSort: Array not sorted");

    })
}

// MARK: Main

var arr:[Int] = []
for (var i = 0; i < 20; ++i) {
    arr.append(Int(rand())%100);
}


useQuickSort(arr)
useQuickerSort(arr)

I get the following result:

quickSort: 784 ms
quickerSort: 772 ms

You obviously tell that the same implementation in Swift is quite slower than in C++, but that’s not the point. We’re not experimenting C++ vs Swift, but single thread vs multiple threads. And using GCD, we see quite a small improvement, even though our test case is so small. This is most probably because we’re not manually handling the thread management ourselves anymore. We simply let the dispatch_queue handle the thread management.

The sunshine of todays experiment is that if we make good use of multi threading, we have a lot of potential for getting a large improvements. And, this is more than enough to motivate me to dive further down the concurrency rabbit hole.

The source code for this experiment is available at the GitHub repository: https://github.com/chunkyguy/ConcurrencyExperiments

Posted in concurrency, _dev | Tagged , , , | Comments Off

The Art of Designing Softwares

When I was in college, my vision of a software was like, we have a problem and we need a solution, a program, a set of instructions, that has to be executed by the computer in order to solve that particular problem. For example, design a program to print a Fibonacci series using loops, recursion, implement with an array, a doubly linked list, an AVL tree.

Then there are these programming challenges like the Facebook Hacker Cup, or the Google CodeJam, that just focus on how strong are you at understanding and implementing common algorithms and data structures, or can you design your own data structures and algorithms?

And, there is nothing wrong with that, but just that writing softwares isn’t just about designing algorithms or data structures, as one eventually discovers.

When I got my first job as an iOS developer, and start writing softwares at the enterprise levels. First thing I realized was that, the solution space is no longer just a set of instructions in a single file, but it is split into hundreds if not thousands of files. I realized that writing softwares is no longer just about algorithms and data structures. In fact, almost all of the interesting algorithms and data structures are either already solved and ready to use for your existing code, or nobody really cares about them as the first priority. There is a high probability that the programming languages of your choice is already bundled with all common data structures and algorithms you’ll ever need.

As a software engineer, most of the time I was working on managing the flow of data and control. The actual work seemed more close to plumbing, where instead of fluids one manages the flow of electrons through the hardware. Initially I was somewhat shocked to see that the iOS framework provides just three data structures, NSArray, NSDictionary and NSSet*. Thats it!

* (maybe also NSMapTable, NSHashTable, NSCountedSet, NSPointerArray,… but let’s assume we haven’t heard of them)

What is software engineering?

For me software engineering is more about understanding the problem at a higher level. It is half part how good you’re at designing the flow yourself and half part, how good are you at reusing the existing solutions written by others.

Part 1: The design space
The design space deals with problems like:

  • How do I want the UI to be?
  • How do I manage downloading of multiple files in background even after my application has been quit?
  • How do I keep the memory used by the application in control? What happens, when I actually run out of memory?
  • How do I confirm that the code is just wrote, is going to pass all edge cases at all times?

Part 2: The integration space
The integration space deals with problems like:

  • How do I use the code I had written some time back to solve a similar problem?
  • How do I design this code, so that it could be used next time I face a similar problem?
  • Has someone already tackled a problem similar to mine? If yes, how can I use their solution?

Of these two subspaces, the design space seems easier, because you’re taught this at college level and you’re in total control of the design. The integration space is hard, and by hard I mean really hard. At a first few attempts, you might seriously consider skipping the integration and try moving it to the comfortable design space, i.e., reinvent the wheel.

But, the reality is that you can’t always design your own solutions. Maybe you’re working with an organization that already has codebase of solutions that has to be used; Or your other teammate is solving the problem in parallel; Or maybe the solution is already written out years ago and has been used and maintained for a really long time. Not to mention the time constraints with the design space.

With my years of development, I’ve come to realization that almost all the problems are already solved or there are people already working on it. If you think, you are the only one working on it, you’re most definitely wrong. (If not, please add me to your project).

I bigger challenge then is, how to use the solution written by others, or by you some time ago? The main problem is that the problem could have been solved in some other context, or maybe it was not thought out enough. There could also be some flaws in the basic design, like the existing solution is not thread safe.

At the even higher level, a software can be written in two ways. Bottom-up or Top-down.

Bottom-up approach
Think of a problem where you have to design a car, in software that is. One of the ways you can design a car is, break down it into smaller tasks.

  • Design the tire.
  • Design the body.
  • Design the engine.

Here each step is recursive in itself, as the designing the engine would be again broken down further until you reach an atomic task that can not be broken down any further. You start working on those pieces and at then assemble them all back into places, and you have a car.

This solution has many benefits like, you can easily assign tasks between team members, and can almost work independently on them. Also, it is easier to design for the future this way, as when you’re working on designing the tires, you’re just focused on designing the tires that should fit all known use cases.

The main problem with bottom-up approach is, you need to be a good future-seeker. Whatever solutions you’re designing, should fall into place perfectly. Say on the final assembly day you realize the sockets on the body are not big enough to hold the tires. Also, as one can tell, bottom-up approach is more time consuming, because you’re designing for the unknown future. Imagine, your car has to perform well on all sort of geographical terrains. Most of the time you might even end up thinking too forward and design for the year 2056, while your solution would become obsolete in next 3 months. Try talking to all those who wrote great solutions with NSThread before Apple rolled GCD. And don’t forget the time constraints, your boss, teammates, consumers are always pressurizing you to release it already.

Top down approach
Think of the same problem of designing the car, but this time you’re working top down. You don’t actually care about the year 2056, but just this day. You start with a giant piece of metal and a empty notepad.

Then you start carving the metal block into the car. When you get to carving out the left tire, you might realize you’ve already done something similar. Then, you realize, yes sometime back you designed the right tire and this feels almost similar. So, you just apply the same procedure again and as your car comes to finish. With top-down approach often you might feel like the amount of work getting reduced with each iteration.

The problems with top-down approach are that, it’s really hard to work with a big team, because you’re blank at the beginning, so you can not assign tasks. Most of the time you need to start alone, and bring in more teammates as you progress.

The main differences between the two approaches are:

  • Top-down feels more like looking into the past while bottom-up is more into looking into the future.
  • Top-down solves the same problem in lesser time than bottom-up.
  • Top-down is not as future safe as bottom-up. Imagine a request to add a radio in the car. With bottom-up an engineer might have already thought about something similar and kept a ‘future’ slot, but with top-down you need to carve a hole and move internal stuff until you just get the radio working.
  • At the end of the day, top-down seems to taking you more closer to the end product, while bottom-up still seems uncertain until the final assembly.

What approach you pick actually depends on the task you’re solving. If you designing a framework or a library that has to be used by others or by you in the future, and have a lot of time at your hands, go for the bottom-up.

If you’re designing the end product, where time is limited. Maybe your product is eventually going to end up in garbage in 2 years, go with top down.

Another approach that can be tried is the hybrid approach, where you break the original task into smaller chunks using the bottom-up approach. Assign each sub problem to each teammate and then work on each sub problem with top-down approach in parallel until the final assembly.

 

Posted in _dev, _general_things | Tagged | Comments Off

Component System using Objective-C Message Forwarding

Ever since I started playing around with the Unity3D, I wanted to build one of my games with component system. And, since I use Objective-C most of the time, so why not use it.

First let me elaborate on my intention. I wanted a component system where I’ve a GameObject that can have one or more components plugged in. To illustrate the main problems with the traditional Actor based model lets assume we’re developing a platformer game. We have these three Actor types:

1. Ninja: A ninja is an actor that a user can control with external inputs.
2. Background: Background is something like a static image.
3. HiddenBonus: Hidden bonus is a invisible actor that a ninja can hit.

Source: Swim Ninja

Source: Swim Ninja

Let’s take a look at all the components inside these actors.

Ninja = RenderComponent + PhysicsComponent
Background = RenderComponent
HiddenBonus = PhysicsComponent.

Here RenderComponent is responsible for just drawing some content on the screen, while the PhysicsComponent is responsible for just doing physics updates like, collision detection.

So, in a nutshell from our game loop we need to call things like

- (void)loopUpdate:(int)dt
{
    [ninja update:dt];
    [bonus update:dt];
}

- (void)loopRender
{
    [background render];
    [ninja render];
}

Now, in the traditional Actor model, if we have an Actor class with both RenderComponent and PhysicsComponent like:

@interface Actor : NSObject

- (void)render;
- (void)update:(int)dt;

@end

Then it would inevitably add a PhysicsComponent to Background actor and a RenderComponent to a HiddenBonus actor.

In Objective-C we can in fact design the component system using the message forwarding.
( Don’t forget to checkout this older post on message forwarding ).

At the core, we can have a GameObject class like

@interface GameObject : NSObject

- (void)render;
- (void)update:(int)dt;

@end

But this doesn’t implements any of these methods, instead it forwards them to relevant Component classes. We can write our Component classes as

@interface PhysicsComponent : NSObject

- (void)update:(int)dt;

@end

@interface RenderComponent : NSObject

- (void)render;

@end

Our GameObject class gets reduced to

@interface GameObject : NSObject

+ (GameObject *)emptyObject;

- (void)enablePhysicsComponent;
- (void)enableRenderComponent;

@end

Our game loop initializes our GameObjects as

- (void)loadGameObjects
{
    ninja = [GameObject emptyObject];
    [ninja enablePhysicsComponent];
    [ninja enableRenderComponent];

    bonus = [GameObject emptyObject];
    [bonus enablePhysicsComponent];

    background = [GameObject emptyObject];
    [background enableRenderComponent];
}

And updates and render them as

- (void)loopUpdate:(int)dt
{
    [ninja update:dt];
    [bonus update:dt];

}

- (void)loopRender
{
    [background render];
    [ninja render];
}

If you compile at this stage, you will get errors like:

‘No visible @interface for ‘GameObject’ declares the selector ‘update’.
‘No visible @interface for ‘GameObject’ declares the selector ‘render’.

In earlier days the code used to compile just like that. Later it became an warning, which was good for most of the developer who migrated to Objective-C from other compile time bound languages like C++. Objective-C is mainly a runtime bound language, but very few developers appreciate this fact and they complained a lot on why compilers don’t catch their errors early on, or more precisely, why it doesn’t works somewhat like C++. And I understand that feeling, because I too love C++. But, when I’m coding in Objective-C I use it like Objective-C. They are two completely different languages based on completely different ideology.

Moving forward in time, ARC came along. It made using Objective-C runtime behavior impossible. Now the ‘missing’ method declaration became an error. For this experimentation, let’s disable ARC entirely and see how it goes.

OK, good enough. The errors are now warnings. But we can at least move forward with our experiment.

Lets add the message forwarding machinery to our GameObject. Whenever a message is passed to an object in Objective-C, if the message is not implemented by the receiver object, then before throwing an exception the Objective-C runtime offers us an opportunity to forward the message to some other delegate object. The way this is done is quite simple, we just need to implement following methods.

First thing we need to handle is passing a selector availability test.

- (BOOL)respondsToSelector:(SEL)aSelector
{
    if ( [super respondsToSelector:aSelector] ) {
        return YES;
    } else if(renderComponent && [renderComponent respondsToSelector:aSelector]) {
        return YES;
    }  else if(physicsComponent && [physicsComponent respondsToSelector:aSelector]) {
        return YES;
    }
    return NO;
}

Next, we need to handle the method selector signature availability test.

- (NSMethodSignature*)methodSignatureForSelector:(SEL)selector
{
    NSMethodSignature* signature = [super methodSignatureForSelector:selector];
    if (signature) {
        return signature;
    }

    if (renderComponent) {
        signature = [renderComponent methodSignatureForSelector:selector];
        if (signature) {
            return signature;
        }
    }

    if (physicsComponent) {
        signature = [physicsComponent methodSignatureForSelector:selector];
        if (signature) {
            return signature;
        }
    }

    return nil;
}

After all the tests have been cleared, it time to implement the actual message forwarding.

- (void)forwardInvocation:(NSInvocation *)anInvocation
{
    if ([renderComponent respondsToSelector: [anInvocation selector]]) {
        [anInvocation invokeWithTarget:renderComponent];
    } else if ([physicsComponent respondsToSelector: [anInvocation selector]]) {
        [anInvocation invokeWithTarget:physicsComponent];
    } else {
        [super forwardInvocation:anInvocation];
    }
}

If you run the code at this point, you should see update and render getting invoked for the corresponding component objects. So yay, our component system prototype is working!

Now let try to make it more concrete. Let’s work on the RenderComponent first. For testing in each render call, we just draw a Cube on the screen.

The best part about the Component System is that every component focuses on just one thing. For example, the RenderComponent just focuses on rendering. We just need to add methods that are required just for rendering.

Lets pass in the matrix information to the RenderComponent in form of a NSValue.

@interface RenderComponent : NSObject

- (void)setNormalMatrix:(NSValue *)nMat;
- (void)setModelViewProjectionMatrix:(NSValue *)mvpMat;

- (void)render;

@end
@implementation RenderComponent

- (void)setNormalMatrix:(NSValue *)nMat;
{
    [nMat getValue:&_normalMatrix];
}

- (void)setModelViewProjectionMatrix:(NSValue *)mvpMat;
{
    [mvpMat getValue:&_modelViewProjectionMatrix];
}

- (void)render;
{
    glUniformMatrix4fv(uniforms[UNIFORM_MODELVIEWPROJECTION_MATRIX], 1, 0, _modelViewProjectionMatrix.m);
    glUniformMatrix3fv(uniforms[UNIFORM_NORMAL_MATRIX], 1, 0, _normalMatrix.m);

    glDrawArrays(GL_TRIANGLES, 0, 36);
}

@end

The conversion from GLKMatrix to NSValue can be done easily like:

    GLKMatrix3 nMat = GLKMatrix3InvertAndTranspose(GLKMatrix4GetMatrix3(modelViewMatrix), NULL);
    GLKMatrix4 mvpMat = GLKMatrix4Multiply(projectionMatrix, modelViewMatrix);
    [ninja setNormalMatrix:[NSValue value:&nMat withObjCType:@encode(GLKMatrix3)]];
    [ninja setModelViewProjectionMatrix: [NSValue value:&mvpMat withObjCType:@encode(GLKMatrix4)]];

Now using our message forwarding magic, in our main loop we just need to update the GameObjects with rendering enabled like so:

- (void)loopUpdate:(int)dt
{

    float aspect = fabsf(self.view.bounds.size.width / self.view.bounds.size.height);
    GLKMatrix4 projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(65.0f), aspect, 0.1f, 100.0f);

    GLKMatrix4 baseModelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, -4.0f);
    baseModelViewMatrix = GLKMatrix4Rotate(baseModelViewMatrix, _rotation, 0.0f, 1.0f, 0.0f);

    // Compute the model view matrix for the object rendered with GLKit
    GLKMatrix4 modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, -1.5f);
    modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, _rotation, 1.0f, 1.0f, 1.0f);
    modelViewMatrix = GLKMatrix4Multiply(baseModelViewMatrix, modelViewMatrix);

    [ninja setNormalMatrix: NSStringFromGLKMatrix3(
     GLKMatrix3InvertAndTranspose(GLKMatrix4GetMatrix3(modelViewMatrix), NULL))];
    [ninja setModelViewProjectionMatrix: NSStringFromGLKMatrix4(GLKMatrix4Multiply(projectionMatrix, modelViewMatrix))];

    modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, 1.5f);
    modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, _rotation, 1.0f, 1.0f, 1.0f);
    modelViewMatrix = GLKMatrix4Multiply(baseModelViewMatrix, modelViewMatrix);

    [background setNormalMatrix: NSStringFromGLKMatrix3(
                                                   GLKMatrix3InvertAndTranspose(GLKMatrix4GetMatrix3(modelViewMatrix), NULL))];
    [background setModelViewProjectionMatrix: NSStringFromGLKMatrix4(GLKMatrix4Multiply(projectionMatrix, modelViewMatrix))];

    [ninja update:dt];
    [bonus update:dt];
}

As you can already tell, how convenient Message Forwarding has made it for us to add our custom code in Components and just directly call them, even though the GameObject class doesn’t directly implements them.

I believe, you can extend this idea to add more Components and rapidly add and use the functionality, while keeping the code logically separated by Components.

The accompanied code is available at https://github.com/chunkyguy/OjbCComponentSystem. Since, this was just a rapid prototype test, and we have disable the ARC, the code could have potential memory leaks. Don’t use the code directly, use it just for reference purposes.

Posted in OpenGL, _dev, _games, _iOS | Tagged , , | Comments Off

C++: Typesafe programming

Lets say you have a function that needs angle in degrees as a parameter.

    void ApplyRotation(const float angle)
    {
        std::cout << "ApplyRotation: " << angle << std::endl;
    }

And an another function that returns a angle in radians

    float GetRotation()
    {
        return 0.45f;
    }

To fill out the missing piece, you write a radians to degree function

    float RadiansToDeg(const float angle)
    {
        return angle * 180.0f / M_PI;
    }

Then some place later, you use the functions like

    float angleRadians = GetRotation();
    float angleDegrees = RadiansToDeg(angleRadians);
    ApplyRotation(angleDegrees);

This is bad. The user of this code, who might be the person next to you, or yourself
10 weeks later, doesn’t knows what does it means by angle in functions ApplyRotation or GetRotation.
Is it angle in radians or angle in degrees?

Yes, you can add a comment on top of this function about where is it a angle in degrees and
where radians. But, that doesn’t actually stop the user from passing a value in whatever format
they will.

The main problem with this piece of code is that it uses a float as a parameter, which is
an implementation detail, and doesn’t passes any other information. In C++ we can do better.

Lets create new types.

    struct Degrees {
        explicit Degrees(float v) :
        value(v)
        {}
        
        float value;
    };
    
    struct Radians {
        explicit Radians(float v) :
        value(v)
        {}
        
        float value;
    };

And update the functions as

    Radians GetRotation()
    {
        return Radians(0.45f);
    }
    
    void ApplyRotation(const Degrees angle)
    {
        std::cout << "ApplyRotation: " << angle.value << std::endl;
    }
    
    Degrees RadiansToDeg(const Radians angle)
    {
        return Degrees(angle.value * 180.0f / M_PI);
    }

Now if we try to call it with following it just works.

    Radians angleRadians = GetRotation();
    Degrees angleDegrees = RadiansToDeg(angleRadians);
    ApplyRotation(angleDegrees);

Notice that, if we don’t specify the constructors as explicit the compiler will implicitly
converts the value from float to corresponding types. Which isn’t what we want.

This is already starting to look good. The code is self documenting, and the user will have
difficult times, if they try to use if differently than intended as most of the error checking
is done by the compiler.

Another benefit is that, now we can have a member function that does the conversion like

    struct Radians {
        explicit Radians(float v) :
        value(v)
        {}

        Degrees ToDegrees() const
        {
            return Degrees(value * 180.0f / M_PI);
        }

        float value;
    };

So, your calling code reduces to

    Radians angle = GetRotation();
    ApplyRotation(angle.ToDegrees());

As you can see, we no longer have variable names that tell the type information, but the
type that tells about itself.

On the final note we can make passing from value to passing as a const reference

    void ApplyRotation(const Degrees &angle)
    {
        std::cout << "ApplyRotation: " << angle.value << std::endl;
    }

This does two things. Firstly, no more copies when data gets passed around and secondly we
can pass a temporary value as guaranteed by the C++ standard, like so

    ApplyRotation(GetRotation().ToDegrees());
    ApplyRotation(Degrees(45.0f));
Posted in _dev | Tagged , | 1 Comment