Add Some Perspective to Your UIViews

The Safari for iOS has some very interesting perspective effect built-in. You can check that when you’ve multiple tabs open. How about building something like that? Those views are clearly UIViews right? You can see the web content rendered, they have cross buttons. The content even periodically updates without launching the app using the background services probably.

IMG_1315

Lets start with searching for something in the Apple’s documentation. As far as I was able to search, I only got so far to the Adding Perspective to Your Animations in the Apple’s Core Animation Programming Guide.

If you just scroll down the ‘Advanced Animation Tricks’ page, at the bottom you’ll find around 10 lines and a small code snippet explaining how to add perspective to your CALayer object.

This is a good start, it’s not great as it skips a lot of details and that’s for a reason. To adding perspective, you have to be familiar with the linear algebra behind it, and we shall get to in the next session. But for now, let’s get started.

So, create a new single view project and add a new Swift file to it. Let’s call it PerspectiveView.swift.

class PerspectiveView: UIView {
}

In the storyboard add a UIView and set its class to be PerspectiveView type. Next, we need a container view to hold any subview into it. We’ll apply perspective to this container view and hopefully the perspective gets applied to all the contained subviews. Let’s call this container view as contentView.

let contentView:UIView?

required init(coder aDecoder: NSCoder)
{
	super.init(coder: aDecoder)
	
	contentView = UIView()

	contentView?.backgroundColor = UIColor.yellowColor()
	backgroundColor = UIColor.purpleColor()

	setUp()
}

I’ve set the background color of the main view as purple and the contentView as yellow just for debugging. Next, lets implement the setUp(). This is where we configure the contentView

func setUp()
{
	var viewDict:[String: UIView] = [String: UIView]()

	viewDict["contentView"] = contentView!
	addSubview(contentView!)
	
	if let imagePath:String = NSBundle.mainBundle().pathForResource("photo4", ofType: "JPG") {
		let image:UIImage = UIImage(contentsOfFile: imagePath)
		let imageView:UIImageView = UIImageView(image: image)
		imageView.setTranslatesAutoresizingMaskIntoConstraints(false)
		viewDict["imageView"] = imageView
		contentView?.addSubview(imageView)
	}

	applyConstraints(viewDict: viewDict)
	applyPerspective()
}

We’re just creating required views here. Here we’re just adding the view that we wish to have perspective applied on to. For now, I’m just adding a single UIImageView.

Next up, applying constraints.

func applyConstraints(#viewDict:[String: UIView])
{        
	contentView?.setTranslatesAutoresizingMaskIntoConstraints(false)
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:[contentView(>=100)]",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:[contentView(>=100)]",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	

	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[imageView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:|-[imageView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))

	addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[contentView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:|-[contentView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
}

I’m just centering all the subview. This is just me being lazy. You should probably add proper constraints to keep the image view aspect correct.

Now coming to the interesting part. Applying perspective.

We just call the calculatePerspectiveTransform() to do all the calculation and return back a CATransform3D object that we can simply apply to the CALayer of our contentView. As for the calculation let’s simply copy-paste the code from Core Animation Programming Guide to calculate the transform.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	return contentTransform
}

func applyPerspective()
{
	contentView?.layer.sublayerTransform = calculatePerspectiveTransform()
}

That’s it. Give it a run. That is not my dog BTW, just photobombing my photo out of nowhere.

 

If for now we just ignore the distortion of the image, which is a constraints issue. There are some other questions to be answered first.

  1. What is sublayerTransform?
  2. Why is eyePosition 10?
  3. What’s up with m34?
  4. And most importantly, where’s the perspective?

Turns out, the snippet provided by the Core Animation Programming Guide just adds the perspective. But, in order to see it in action you still have to modify the transform a bit more. Let’s add a little translation to the transform.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

2

Good. But from this angle this looks just as if the image has been scaled down. Why not rotate it along x axis.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	contentTransform = CATransform3DRotate(contentTransform, CGFloat(GLKMathDegreesToRadians(45)), 1, 0, 0)
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

3

Wow! Now we have some perspective. We can now either just tinker with the magic numbers until we get the desired effect, or we can get a deeper understanding of things and have a better control over things.

I prefer the later. So let’s begin by answering some of the questions from above.

What is sublayerTransform?

If you look at the interface of CALayer, you’ll see there are two CATransform3D types.

class CALayer : NSObject, NSCoding, CAMediaTiming {
	/* other stuff */
    var transform: CATransform3D
    var sublayerTransform: CATransform3D
}

When you modify transform, the layer’s own content gets modified. Whereas, when you modify the sublayerTransform, the sublayers get modified, while the receiver’s layer remains untouched.

If you replace sublayerTransform with transform

contentView?.layer.transform = calculatePerspectiveTransform()

You would get something like

4

See what I mean? Our contentView which had a background color yellow also got modified. Let’s undo that code change. We need the contentView unmodified for things like reading touch and gestures.

What is eyePosition?

For the code above, it’s enough to understand that eyePosition here just means the degree of perspective you want to have. If it is some larger value, the effect is less and if it is a smaller value, the effect more. We shall look behind the maths of this in the next session with linear algebra. But for now you can try experimenting with values like 5, 50, 500, 5000, 50000 and see the changes yourself.

7

What’s up with m34?

There are mostly two kind of projections we see in computer graphics, the orthogonal projection and the perspective projection. Orthogonal is the default projection used by any 2D system like UIKit. Whereas the perspective projection is used by all 3D systems, like any 3D game. The main difference is that in orthogonal projection the distance from the viewer is not accounted, or in other words the z axis is totally ignored.

CATransform3D transform is a 4×4 matrix. It works in a homogenous coordinate system. We will dive deeper into homogenous coordinate system in a later session. For now it’s enough to understand that the purpose of this matrix is to convert your points from a 3D space to a screen space which is in 2D.

The m34, or the value at 3rd row, 4th column of the matrix is the biggest hint whether the projection matrix is an orthogonal or a perspective. An orthogonal projection matrix typically has m34 as 0 while a perspective matrix has some negative value here, typically -1. If in the above code, you simply set the value of m34 as 0, you’ll notice that all the perspective effect is gone!

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 50000.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = 0
	
	contentTransform = CATransform3DRotate(contentTransform, CGFloat(GLKMathDegreesToRadians(45)), 1, 0, 0)
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

5

Of course, the view is scaled, because this isn’t a perfect orthogonal matrix, we’re still performing the rotation and the translation on the matrix. But the main perspective effect is gone.

With the basic questions answered, let’s now build a proper projection system. I’ll be using the GLKit, because it has all the things we need to build the system we want. Since, as of this writing GLKit isn’t available in Swift, we might have to do this work in Objective-C and return the CATransform3D object back to our Swift class, where we can simply apply it to our contentView.

// Transform.h
@interface Transform : NSObject
@property (nonatomic, readonly) CATransform3D transform;
@end

// PerspectiveView.swift
func applyPerspective()
{
	let contentTransform:Transform = Transform()
	contentView?.layer.sublayerTransform = contentTransform.transform
}

First we need a Camera class. Think of a camera. In a camera you control two things. First is the lens setting like the field of view, focus, aperture. The kind of things you usually set once before you begin the filming. The second is the camera motion, like the direction you want to point, rotating the camera and so forth.

/** Camera object */
@interface Camera : NSObject

/* lens */
// field of view - in radians
@property (nonatomic, readwrite) float fov;
@property (nonatomic, readwrite) float aspectRatio;
// near and far planes
@property (nonatomic, readwrite) float nearZ, farZ;

/* motion  */
@property (nonatomic, readwrite) float eyeX, eyeY, eyeZ;
@property (nonatomic, readwrite) float centerX, centerY, centerZ;
@property (nonatomic, readwrite) float upX, upY, upZ;

/* Read by Transform object */
@property (nonatomic, readonly) GLKMatrix4 projectionMatrix;
@property (nonatomic, readonly) GLKMatrix4 viewMatrix;

@end

Let’s look at each of these items one by one:

1. Field of view: Controls the area you want to capture. The wider the angle, the more area you capture. But at the cost of greater distortion. The fish eye lens has a very wide fov.

2. Aspect ratio: Controls the aspect ratio of the image captured. A value of 1 means the captured image will be distorted to fit within a square. You typically want this to be the actual ratio you wish to capture.

3. nearZ, farZ: The clip planes. Anything farther than farZ or nearer than nearZ doesn’t gets included in the final image. You don’t want the farZ to be too far, as it brings in more floating-point errors. So if set the farZ to 1,000,000 and you’ve two objects placed at z 10 and 12. The one at 12 might overlap the one at 10, even though it is farther down the z axis. Typically a value of 0.1 and 100 is good enough for nearZ and farZ respectively.

4. center: This is where the camera is placed at. Default value is origin.

5. up: Tells what side is considered as up. Default value if (0,1,0), that is up is along the y axis.

6. eye: This is the direction you’re pointing at. Actually the final direction is calculated using the up and center values as well.

Since the Camera class controls two independent things, we can have separate matrices for each one of those. The role of a matrix is just to transform points from one coordinate system to another. A matrix simply defines a coordinate system. So, if you have a 4×4 matrix and you multiply it with a 4D vector you get a 4D vector in the transformed coordinate space.

For our Camera class, we just keep track of two coordinate systems

- (void) updateProjection;
{
    self.projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(_fov),
                                                      _aspectRatio,
                                                      _nearZ, _farZ);
}

- (void) updateView;
{
    self.viewMatrix = GLKMatrix4MakeLookAt(_eyeX, _eyeY, _eyeZ,
                                           _centerX, _centerY, _centerZ,
                                           _upX, _upY, _upZ);
}

Whenever any of these values get updated, we simply update the related matrix.

Now let’s focus on the Transform class.

@interface Transform : NSObject

- (id)initWithCamera:(Camera *)camera;

@property (nonatomic, readwrite) float positionX, positionY, positionZ;
@property (nonatomic, readwrite) float rotationX, rotationY, rotationZ;
@property (nonatomic, readwrite) float scaleX, scaleY, scaleZ;
@property (nonatomic, readwrite) float angle;

@property (nonatomic, readonly) CATransform3D transform;

@end

This class is pretty straightforward. It just describes the transformations to be applied the object. The only thing that could be misunderstood is rotation. Rotation doesn’t describes the angle, it describes the axis. For angle we’ve another property. The final rotation is calculated from both rotation and angle.

Why do we need the Camera object? It’s because the final image is calculated after considering the camera object. Think of you shooting a frog leaping. The final captured motion depends on both the leap of the frog and the motion of your camera.

With that setup, lets see what can we build now. With a little experiment on the fov, camera’s eyeZ and the transform angle.

func applyPerspective()
{
	// config camera
	let contentCam:Camera = Camera()
	contentCam.fov = 10
	contentCam.aspectRatio = Float(CGRectGetWidth(UIScreen.mainScreen().bounds))/Float(CGRectGetHeight(UIScreen.mainScreen().bounds))
	contentCam.eyeZ = 25
	
	// config content transform
	let contentTransform:Transform = Transform(camera: contentCam)
	contentTransform.rotationX = 1.0
	contentTransform.rotationY = 0.0
	contentTransform.rotationZ = 0.0
	contentTransform.angle = -1.0
	
	contentView?.layer.sublayerTransform = contentTransform.transform
}

I was able to get this

6

Now, for your experimentation, you can test that when we update the fov how distorted the image gets. Also updating the eyeZ of camera actually zooms in and out the content. Next you can also experiment with different transform angles and axes.

There’s a whole lot of things I’ve skipped in the implementation of the Camera and the Transform class. It’s more about why things work the way they work. In particular the insides of

  • GLKMatrix4MakeLookAt()
  • GLKMatrix4MakePerspective()
  • GLKMatrix4MakeTranslation()
  • GLKMatrix4Rotate()
  • GLKMatrix4Multiply()
  • and much more

These are some of the most important functions where the real linear algebra magic is. But, I thought this is more like low level stuff and deserves a rant of its own. In the follow-up I’ll dive deeper in to these function and much more on linear algebra.

The code for this article is available at https://github.com/chunkyguy/DemoPerspectiveView

Posted in maths, _dev, _iOS | Tagged , , , | Comments Off

Concurrency: Deadlocks

We’ve explored so much with our concurrency experiments and yet there’s one fundamental topic we haven’t touched so far. Deadlocks. Whenever we think of concurrency, we can’t help thinking of locks for protecting shared resources, managing synchronization and what not. But, as Uncle Ben has said, with great power comes great responsibility, similarly with great locks comes great deadlocks.

Deadlocks with multiple locks

In the classical concurrency theory, deadlocks happen when more than one thread needs mutual exclusivity over more than one shared resource. There’s a classic Dining Philosophers problem that demonstrates this deadlock situation.

Here’s the copy paste from the wikipedia page:

Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when he has both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it’s not being used by another philosopher. After he finishes eating, he needs to put down both forks so they become available to others. A philosopher can grab the fork on his right or the one on his left as they become available, but can’t start eating before getting both of them.

There are many problems hidden inside this one problem, but lets focus on the situation where every philosopher gets super hungry has acquired the fork on their left side and waiting for someone to release a fork before they can begin eating. But since, no philosopher is willing to release their fork, we have a deadlock situation.

Let’s reduce this problem to just two philosophers dining. Here’s a simple implementation

#define MAX_FORKS 2

std::mutex fork[MAX_FORKS];

class Philosopher {
private:
    std::string name_;
    int holdForkIndex_;

    void Think()
    {
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }
    
public:
    Philosopher(const std::string &name, const int startForkIndex) :
    name_(name),
    holdForkIndex_(startForkIndex)
    {}

    void Eat()
    {
        std::cout << name_ << ": Begin eating" << std::endl;

        Think();
        
        std::lock_guard<std::mutex> locka(fork[holdForkIndex_]);
        std::cout << name_ << ": Hold fork: " << holdForkIndex_ << std::endl;
        holdForkIndex_ = (holdForkIndex_ + 1) % MAX_FORKS;

        Think();

        std::lock_guard<std::mutex> lockb(fork[holdForkIndex_]);
        std::cout << name_ << ": Hold fork: " << holdForkIndex_ << std::endl;
        holdForkIndex_ = (holdForkIndex_ + 1) % MAX_FORKS;

        // eating
        std::this_thread::sleep_for(std::chrono::microseconds(10));
        
        std::cout << name_ << " End eating" << std::endl;
    }
};

void DiningPhilosophers()
{
    Philosopher s("Socrates", 0);
    Philosopher n("Nietzsche", 1);
    
    std::thread sEat(&Philosopher::Eat, &s);
    std::thread nEat(&Philosopher::Eat, &n);
    
    sEat.join();
    nEat.join();
}

/* this is the main() */
void Deadlocks_main()
{
    DiningPhilosophers();
}

We have two philosophers and two forks. In each Eat() function we follow the order:

  1. Think
  2. Grab a fork
  3. Think
  4. Grab another fork
  5. Eat
  6. Release both forks

Here’s the output on my machine, before the program just hangs:

Socrates: Begin eating
Nietzsche: Begin eating
Socrates: Hold fork: 0
Nietzsche: Hold fork: 1

As you can see, each philosopher acquires a fork and then just waits indefinitely. One way to break this deadlock is to use std::lock() function. What this function does is that it provides a functionality to lock one or more mutexes at once if it can, otherwise it locks nothing.

void Eat()
{
	std::cout << name_ << ": Begin eating" << std::endl;

	int lockAIndex = holdForkIndex_;
	int lockBIndex = (holdForkIndex_ + 1) % MAX_FORKS;
	
	std::lock(fork[lockAIndex], fork[lockBIndex]);
	
	Think();
	
	std::lock_guard<std::mutex> locka(fork[lockAIndex], std::adopt_lock);
	std::cout << name_ << ": Hold fork: " << lockAIndex << std::endl;

	Think();

	std::lock_guard<std::mutex> lockb(fork[lockBIndex], std::adopt_lock);
	std::cout << name_ << ": Hold fork: " << lockBIndex << std::endl;

	// eating
	std::this_thread::sleep_for(std::chrono::microseconds(10));
	
	std::cout << name_ << " End eating" << std::endl;
}

First, the philosopher tries to lock both the forks. If successful, then they proceed further to eat, otherwise they just wait.

Notice two things here, first of all we’re not using any explicit loops to lock-test-release mutex. All of that is handled by std::lock(). Second, we’re providing the extra parameter std::adopt_lock as a second argument to std::lock_guard. This gives a hint to std::lock_guard to not lock the mutex at the construction, as we’re using the std::lock() for locking mutexes. The only role of std::lock_guard() here is to guarantee an unlock when the Eat() goes out of scope.

Here’s the output for the above code:

Socrates: Begin eating
Nietzsche: Begin eating
Socrates: Hold fork: 0
Socrates: Hold fork: 1
Socrates End eating
Nietzsche: Hold fork: 1
Nietzsche: Hold fork: 0
Nietzsche End eating

Deadlocks with single lock

Deadlocks happen more often when more than one mutex is involved. But, that doesn’t means that deadlocks can’t happen with a single mutex. You must be thinking, how much stupid one has to be just using a single mutex and still deadlocking the code. Picture this scenario:

Homer Simpson works at Sector 7-G of a high security Nuclear Power Plant. To avoid the unproductive water-cooler chit-chat, their boss Mr. Burns has placed the water cooler inside a highly protected cabin which is protected by a passcode, allowing only single employee in at a time. Also, to keep track of how much water is being consumed by each employee, Mr. Burns has equipped the water cooler with mechanism that every employee needs to swipe their card in order to use it.

Homer is thirsty, so he decides to get some water. He enters the passcode and is inside the protected chamber. Once there, he realizes he’s left his card at his desk. Being lazy, he decides to call his friend Lenny to fetch his card. Lenny goes to Homer’s office, grabs his card and goes straight to the water cooler chamber, only to find it locked from the inside.

Now, inside Homer is waiting for Lenny to get his card. While, outside Lenny is waiting for the door to get unlocked. And we have a deadlock situation.

Here’s a representation of above problem in code

std::mutex m;

void SwipeCard()
{
    std::cout << "Card swiping ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Card swiped" << std::endl;
}

void GetWater()
{
    std::cout << "Chamber unlocking ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Chamber unlocked" << std::endl;

    SwipeCard();

    std::cout << "Water pouring" << std::endl;
}

void Sector7GSituation()
{
    GetWater();
}

And here’s the output before the program hangs:

Chamber unlocking ... 
Chamber unlocked
Card swiping ... 

The moral of the story is, calling of user code after acquiring a lock should be handled with uttermost care. For example, in above code calling SwipeCard() after acquiring a lock is a big clue that there are some design errors with this code.

The solution is to the restructure the code with something like:

std::mutex m;

void SwipeCard()
{
    std::cout << "Card swiping ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Card swiped" << std::endl;
}

void EnterChamber()
{
    std::cout << "Chamber unlocking ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Chamber unlocked" << std::endl;
}

void GetWater()
{
    EnterChamber();
    SwipeCard();

    std::cout << "Water pouring" << std::endl;
}

void Sector7GSituation()
{
    GetWater();
}

Output:

Chamber unlocking ... 
Chamber unlocked
Card swiping ... 
Card swiped
Water pouring

Deadlocks and libdispatch

Moving over to GCD, we don’t have to worry about such nitty-gritty details, as we don’t have to care about mutex and locks. But still deadlocks can happen, because deadlocks are more of design errors than anything else.

Here’s one example of a deadlock using GCD

let queue:dispatch_queue_t = dispatch_queue_create("deadlockQueue", nil)

func SimpleDeadlock()
{
    dispatch_sync(queue) {
        println("Enter task: 0")
        dispatch_sync(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

SimpleDeadlock()

The jobs sounds simple enough, we have a serial queue. We have two tasks, we need them to be executed sequentially. Each task has a taskId associated with it, just so that we can keep track of what’s getting executed. We use dispatch_sync because we want our tasks to be executed one after the other.

Here’s the output, before it gets deadlocked:

Enter task: 0

So, why isn’t the task 1 getting executed? Well, lets understand how dispatch_sync works. A dispatch_sync blocks the queue until it has been executed fully. Consider the following situation:

Say there is a hair saloon, they’ve a very strict first come first serve policy. A mother and a daughter visit the saloon. The mother steps in first so gets the client id 0 while the daughter gets client id 1. According to the rules, they have to serve the mother first, because she has the lowest numbered client id. But, the mother insists that they first serve her daughter. If they start serving the daughter they’re actually breaking the rule, as then they would be serving client 1 before 0. Hence, the deadlock.

And this is exactly why task 1 is not getting initiated, because the serial dispatch_queue is waiting for the task 0 to finish. But, the task 0 is insisting the queue to finish the task 1 first.

There are two ways to resolve this deadlock. First is using a concurrent queue instead of a serial queue.

let queue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)

func SimpleDeadlock()
{
    dispatch_sync(queue) {
        println("Enter task: 0")
        dispatch_sync(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

SimpleDeadlock()

Output

Enter task: 0
Enter task: 1
Exit task: 1
Exit task: 0

The reason why this works is that a concurrent queue is allowed to work on more than one submitted task at a time. So, first task 0 is submitted. Task 0 then submits task 1 and demands it to be executed synchronously or in other words to be finished before finishing task 0. Since this is a concurrent queue, it’s distributes its time over all the queued tasks.

In terms of our mother-daughter-and-the-saloon example above. Lets say the saloon has a big clock that ticks every 1 minute. For that 1 minute they focus on a single client and then at the next tick they switch to another client, irrespective of what state the client is in.

So at first tick they focus on the mother, the mother just sits there not allowing them to even touch her hairs before her daughter is done. At next tick, they switch to the daughter and start serving her. This repeats until the daughter is fully done, then the mother allows them to work on her hairs.

This works, but it has a problem. Remember, we wanted the task 0 to be fully completed before task 1. That’s probably why we thought of using serial queues and synchronous tasks. This solution actually breaks it. What we actually need is a serial queue with tasks asynchronously submitted.

let queue:dispatch_queue_t = dispatch_queue_create("deadlockQueue", nil)

func SimpleDeadlock()
{
    dispatch_async(queue) {
        println("Enter task: 0")
        dispatch_async(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

XCPSetExecutionShouldContinueIndefinitely(continueIndefinitely: true)

SimpleDeadlock()

Output:

Enter task: 0
Exit task: 0
Enter task: 1
Exit task: 1

This solution submits task 1 after task 0 to a serial queue. Since, this is a serial queue, the task ordering is preserved. Also, since tasks do not depend on each other, we never see any deadlocks.

The lesson here is to use dispatch_sync very carefully. Remember deadlocks are always design errors.

The code for todays experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , , , , | Comments Off

Concurrency: Introducing Inter-Thread Communication

Picture the scenario: You’re on the phone with your friend, when suddenly he say ‘Hey! Hold on. I think I’m getting another call’ and before you can say anything, he puts you on hold. Now, you have no idea whether to hang up, or wait. Your friend could be coming back in 3 seconds or 3 hours you’ve no idea. Wouldn’t it be great if your phone service provided you a mechanism that whenever you’re put on hold, you can freely disconnect and whenever your friend disconnects the other call, you immediately get a ring. Let’s talk about managing synchronization between threads.

On a broader level the above is a classical producer-consumer problem. Both consumer and producer are synchronized by a shared data. What we actually need is a way to communicate between the producer and the consumer. The producer needs to signal the consumer the data is ready for consumption.

So far in all our concurrency experiments whenever we’ve some shared data, we explicitly block one or more threads while one thread gets to play with the resource. The shared data here seems to be a boolean flag of some sort. But we can do better with std::condition_variable

    std::mutex mutex;
    std::condition_variable condition;

    void StartConsumer()
    {
        std::cout << std::this_thread::get_id() << " Consumer sleeping" << std::endl;
        std::unique_lock<std::mutex> consumerLock(mutex);
        condition.wait(consumerLock);
        consumerLock.unlock();
        
        std::cout << std::this_thread::get_id() << " Consumer working" << std::endl;
    }
    
    void StartProducer()
    {
        std::cout << std::this_thread::get_id() << " Producer working" << std::endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::lock_guard<std::mutex> producerLock(mutex);
        condition.notify_one();
        std::cout << std::this_thread::get_id() << " Producer sleeping" << std::endl;
    }
    
    void StartProcess()
    {
        std::thread consumerThread(StartConsumer);
        std::thread producerThread(StartProducer);
        
        consumerThread.join();
        producerThread.join();
    }

Output

0x100ced000 Consumer sleeping
0x100d73000 Producer working
0x100d73000 Producer sleeping
0x100ced000 Consumer working

In the above code both the consumer and producer execute simultaneously in concurrent threads. The consumer waits for the producer to finish working on whatever it was doing. But instead of continuously checking for whether the producer is done as in

while (!done) {
	std::unique_lock<std::mutex> consumerLock(mutex);
	consumerLock.unlock();
	std::this_thread::sleep_for(std::chrono::microseconds(1));
}

We make use of condition variable to inform us whenever the producer is done. The producer then using notify_one() (or maybe notify_all()) sends a notification to all the waiting threads to continue.

Moving to GCD, we’ve two ways. First is to use dispatch_group.

let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
let group:dispatch_group_t = dispatch_group_create()

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    println("Consumer working")
}

func StartProducer()
{
    dispatch_group_async(group, concurrentQueue) { () -> Void in
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Output:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

The first thing that you’ll notice in this code is that we start the producer code first and then the consumer waits. This is because if we start the consumer first, the queue has nothing in it to wait for.

dispatch_group is actually meant to synchronize a set a tasks in a queue. For example, you’re loading the next level in your game. For that, you need to download a lot of images and other data files from a remote server before finishing the loading. What you’re doing is synchronizing a lot of async tasks and wait for them to finish before doing something else.

For this experiment, we can better use a dispatch_semaphore. dispatch_semaphore is a functionality based on the classic semaphores concepts you must’ve learned during your college days.

let sema:dispatch_semaphore_t = dispatch_semaphore_create(1);

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    println("Producer working")
    sleep(1)
    println("Producer sleeping")
    dispatch_semaphore_signal(sema)
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

If you look closely, we haven’t even used any dispatch_queue here. This is because dispatch_semaphore is more about synchronization than about threading.

dispatch_semaphore is basically just a count based blocking system. We just create a dispatch_semaphore object by telling it how many counts can be decremented before it blocks. Then on each wait it decrements the count and on each signal it increments the count. If the count reaches less than 0, the thread is blocked until somebody issues a signal.

Here’s another example to illustrate on using semaphores between two queues

let sema:dispatch_semaphore_t = dispatch_semaphore_create(0);
let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    dispatch_async(concurrentQueue) {
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
        dispatch_semaphore_signal(sema)
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Again note that we’re starting producer first, and because we initialize the dispatch_semaphore with 0, the consumer is going to block the main thread as soon as it is invoked. It won’t even give the chance to run the producer code in parallel.

Here’s the output of above code:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

This is just an scratch of the surface of inter-thread communication. Most of problems with concurrency actually somehow deal with inter-thread communication. In later sessions we shall experiment more on this. The idea for this article came from the this discussion on problem of loading game data in a parallel thread and the solution I worked on.

The code for today’s experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , , | Comments Off

Concurrency: Thread-safe Singletons

Love them or hate them, you just can not ignore singletons. Let’s try experimenting with singletons and concurrency.

Singletons are objects that you wish to have only a single instance all over your program lifecycle. Usually, you want to create singleton objects lazily. Here’s a typical way you can create a singleton in a single threaded application.

class Library {
public:

    static Library *SharedInstance()
    {
        if (!shared_) {
            shared_ = new Library();
        }
        return shared_;
    }

    static void Dealloc()
    {
        if (shared_) {
            delete shared_;
        }
    }
    
    Library() :
    books_({
        {"Ernest Hemingway", "Old man and the sea"},
        {"Aldous Huxley", "Brave new world"},
        {"Aldous Huxley", "The Genius and the Goddess"},
        {"Aldous Huxley", "Antic Hay"},
        {"Salman Rushdie", "Midnight's children"},
        {"Fyodor Dostovesky", "Crime and punishment"},
        {"Boris Pasternak", "Doctor Zhivago"},
        {"Bernard Shaw", "Arms and the Man"},
        {"Bernard Shaw", "Man and Superman"}
    })
    {
        std::cout << "Library " << this << std::endl;
    }
    
    ~Library()
    {
        std::cout << "~Library " << this << std::endl;
    }
    
    std::vector<std::string> GetBooks(const std::string &author) const
    {
        typedef std::multimap<std::string, std::string>::const_iterator BookIt;
        std::pair<BookIt, BookIt> range =  books_.equal_range(author);
        std::vector<std::string> bookList;
        for (BookIt it = range.first; it != range.second; ++it) {
            bookList.push_back(it->second);
        }
        return bookList;
    }
    
    std::size_t GetSize() const
    {
        return books_.size();
    }
    
private:
    std::multimap<std::string, std::string> books_;
    static Library *shared_;
};

Library *Library::shared_ = nullptr;

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance()->GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance()->GetSize() << " books" << std::endl;
}

When invoked with:

void SerialImpl()
{
    std::string searchAuthor("Aldous Huxley");
    PrintBooks(searchAuthor);
    PrintSize();
}

void ThreadsafeSingleton_main()
{
    SerialImpl();
    Library::Dealloc();
}

We get

Searching for author: Aldous Huxley
Library 0x100a6bfe0
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
~Library 0x100a6bfe0
main() exit

We see that our singleton actually gets created lazily when actually required and also gets deallocated when the program finally quits, thanks to the Dealloc() function.

Next, lets try calling it from multiple threads.

void ConcurrentImpl()
{
    std::string searchAuthor("Aldous Huxley");
    std::thread bookSearchThread(PrintBooks, std::ref(searchAuthor));
    std::thread libSizeThread(PrintSize);
    
    bookSearchThread.join();
    libSizeThread.join();
}

This might actually work or not depending on how the threads are switching on your machine. To see what is problem with this code, let’s deliberately sleep the thread while creating the singleton instance.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

Now, if you run it you might see two instances of Library objects getting created.

Searching for author: Aldous Huxley
Library Size: Library 0x100b84fe0
Brave new world
The Genius and the Goddess
Antic Hay
Library 0x100ba4fe0
9 books
~Library 0x100ba4fe0
main() exit

There are two problems here. First of all, we’ve two instances of the Library object and second, because our Dealloc routine depends on the guarantee that at any given moment we have a single instance of the Library object, so the other Library object never gets released. This is a classic example of a memory leak.

We can solve this issue with two techniques. First is using the C++11’s static initialization.

Let’s modify our Library class and instead of creating the singleton at heap, we create a static instance of a Library object.

static Library &SharedInstance()
{
    static Library lib;
    return lib;
}

And modifying our calls

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance().GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance().GetSize() << " books" << std::endl;
}

If we now run our code, we get

Searching for author: Aldous Huxley
Library 0x10006d3c8
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
main() exit
~Library 0x10006d3c8

This works because with C++11 we’re now guaranteed to have the static variables created thread-safely. Another benefit with this approach is that we don’t have to implement any Library::Dealloc() function as static data gets deallocated automatically by the runtime. If this satisfies your requirements, this is best way out.

But, this won’t be the best fit for some cases. Say, you want to have Library singleton object only in heap memory. Also, if you’re using Objective-C this won’t work as you’re not allowed to allocate memory of custom objects on the stack. This brings us to our second method: Using C++ thread library.

Let’s switch back to our original implementation, with Library object in heap memory. The first thing that we can try is using mutex.

    static Library *SharedInstance()
    {
        std::lock_guard<std::mutex> lock(mutex_);
        
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

And this works as expected:

Searching for author: Aldous Huxley
Library Size: Library 0x100b92fe0
Brave new world
The Genius and the Goddess
Antic Hay
9 books
~Library 0x100b92fe0
main() exit

The only problem with this code is that Library::SharedInstance() is going to be called from plenty of places, and this code unnecessary lock and unlock the mutex at each call. What we actually need it the mutex only for cases when the single instance has not been created, all other times we can just simple pass the pointer back. This brings us to the infamous double checking lock design.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::lock_guard<std::mutex> lock(mutex_);
            if (!shared_) {
                std::this_thread::sleep_for(std::chrono::seconds(1));
                shared_ = new Library();
            }
        }
        return shared_;
    }

This code looks great right? It applies the mutex only when creating the shared instance, every other time it just returns the pointer. But this has a serious bug. This code would work almost all the time, except when you’ve forgotten about it. It’s so hard to debug that you probably can’t even simulate the issue in the debug environment and unfortunately you won’t ever see this bug again for probably a million clock cycles. This sounds like a programmer’s nightmare right?

Suppose you’re running this code with two thread ‘bookSearchThread’ and ‘libSizeThread’. bookSearchThread comes in and sees the shared_ is NULL. It would acquire the lock and start allocating and initializing the instance. bookSearchThread is half-way done, it has allocated a space in the heap, so the shared_ isn’t NULL anymore, but bookSearchThread is not done initializing the instance yet, just the allocation. Something like in Objective-C as:

Library *shared = [[Library alloc] init];

Suddenly, the thread switching happens and libSizeThread comes in. It sees the shared_ isn’t NULL. It says OK, good somebody must have already created this instance, lets use it straightaway. This could result in undefined behavior, as libSizeThread is now using an object that isn’t actually fully initialized.

This is such a great problem that almost every thread library provides a way to make this work. In C++11, we can use the std::call_once

class Library {
private:
    static std::once_flag onceFlag_;
    /* other data members */
    
    public:
    static Library *SharedInstance()
    {
        std::call_once(onceFlag_, []{
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        });
        return shared_;
    }
    
    /* other member functions */
}

Even GCD provides something similar:

+ (instancetype)sharedInstance
{
    static dispatch_once_t onceFlag;
    static id shared_;
    dispatch_once(&once, ^{
        shared_ = [[[self class] alloc] init];
    });
    return shared_;
}

These helper objects from standard libraries are provided just to guarantee that the object allocation and initialization is complete before setting the flag and every other time, it just returns immediately. So, no more race conditions for us.

So there it is, singletons the thread-safe way.

The code for todays experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , , , | Comments Off

Concurrency: Working with shared data using threads

In continuation with the last week’s experiment on protecting shared data in a concurrent system, let’s look at how we can make it work using C++ and threads.

Here’s a brief recap: We’re building a Airport scenario, where we have a single status board screen. The status board is simultaneously being edited by multiple airliners. The status board can be read by any passenger at any given moment for any flight status based on the flight number.

Let’s build the concept within a single-threaded environment:

class StatusBoard {
    std::forward_list<int> statusBoard;
    
public:
    void AddFlight()
    {
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        return (std::find(statusBoard.begin(), 
                          statusBoard.end(), 
                          flightNum) != statusBoard.end());
    }
};

StatusBoard statusBoard;

void StatusWriter()
{
    debugCount.write++;
    statusBoard.AddFlight();
}

bool StatusReader(const int searchFlightNum)
{
    debugCount.read++;
    return statusBoard.FindFlight(searchFlightNum);
}

void SerialLoop()
{
    while (!StatusReader(SEARCH_FLIGHT)) {
        StatusWriter();
    }
}

On my machine, it outputs:

Flight found after 26 writes and 27 reads.

Now let’s begin with making this implementation multithreaded.

void StatusWriter()
{
    std::this_thread::sleep_for(std::chrono::seconds(1));
    
    debugCount.write++;
    statusBoard.AddFlight();
}

void ParallelLoop()
{
    while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {
        /* write operation */
        std::thread writeOp(StatusWriter);
        writeOp.detach();
    }
}

The good thing about C++ is that much of the code looks almost similar to the single threaded code and that is due to the way the thread library has been designed. At first glance you won’t be even able to tell where the threading code actually is. Here we’re invoking two operations on two threads at each iteration of the loop.

First we have a read operation

while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {

This makes use of std::future to invoke the StatusReader function on a async thread. The get() call at the end actually start the process. We shall talk about futures and promises in some future experiment. In this experiment it would be suffice to understand that a call to std::async() starts the StatusReader() function in a concurrent thread that magically returns back the result with the get() to the caller thread.

Another is simply spawning of a std::thread using the direct use with detach that we’re already familiar of from our previous experiment.

Here we just start the StatusWriter function on a new thread every time the loop iterates.

/* write operation */
std::thread writeOp(StatusWriter);
writeOp.detach();

On my machine, when the code doesn’t crashes, it prints:

Flight found after 26 writes and 1614 reads.

Obviously, this code isn’t thread-safe. We’re reading and writing to the same forward list from more than one threads, and this code is supposed to crash a lot. So, next step let’s make it thread safe using a mutex.

A mutex is the basic object that is used to provide mutual exclusivity to a portion of code. The mutex in C++ are used to lock a shared resource.

First of all we start with a std::mutex object. This object guarantees that the code is locked for a single thread use at a time. We can use the lock() and unlock() member functions on std::mutex object like:

void foo()
{
    std::mutex mutex;
    
    mutex.lock();
    
    /* 
     * some resource locking code here
     */
    
    mutex.unlock();
}

If you’ve used C++ long enough, you’ll know that this code is prone to all sort of errors. In many cases the foo() can exit before the unlock() gets executed. This could be due to any early returns that you have added to the function. Or, in a more likely scenario, your code does something illegal and the foo() throws some exception. Even if foo() is very robust, it could call some another function which isn’t that robust.

If you’re an old time C++ user, you must know that to solve this problem space we often use RAII idiom. Good for you, the standard thread library already provides this for you in form of a std::lock_guard. The std::lock_guard is designed to work with any lock like object.

Using std::mutex and std::lock_guard our code is now:

class StatusBoard {
    std::forward_list<int> statusBoard;
    mutable std::mutex mutex;
    
public:
    void AddFlight()
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        return (std::find(statusBoard.begin(), statusBoard.end(), flightNum) != statusBoard.end());
    }
};

The mutable keyword is used as we’re using the mutex in a const member function FindFlight(). On my machine it executes as:

Flight found after 10 writes and 1591 reads.

Sometime, the code also throws the following exception after exiting from main:

libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument

This is due to the fact that we’ve explicitly added the thread delay for debugging and some of out async threads are executing as detached and might not be over before the main exits. I don’t think this is going to be a problem in any real code.

So, there, we’ve a basic thread-safe code. This is not the best implementation possible. There are many things to look out for. Like, we’ve to take care of not pass out any references or pointers of shared data, otherwise the code might become unsafe again. Always remember that at the low level all threads share the same address space, and whenever we’ve a shared resource without a lock, we have a race condition. This is more of the design error from the programmer’s perspective.

As always this code is available online at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, _dev | Tagged , | Comments Off

Concurrency: Working with shared data using queues

All problems with concurrency basically boil down to two categories: Race conditions and deadlocks. Today let’s look into one specific problem under race conditions: Working with shared data.

If every thread had to deal with the only data provided exclusively to it, we won’t ever get into any sort of data race problems. In fact, this is one of the main features of functional programming languages like Haskell that advertises heavily about concurrency and parallelism.

But there’s nothing to be worried about, functional programming is more about adopting a design pattern, just as object oriented programming is. Yes, some programming languages offer more towards a particular pattern, but that doesn’t means that we can not adopt that pattern in any other language. We just have to restrict ourselves to a particular convention. For example, there are a plenty or large softwares written with C programming language adopting the object oriented pattern. And for us, both C++ and Swift are very much capable of adopting the functional programming paradigm.

Getting back to the problem of shared data. Let’s look into when is shared data a problem with multiple threads. Suppose we have an application where two or more threads are working on a shared data. If all each threads ever does is a read activity, we won’t have any data race problem. For example, you’re at the airport and checking out your flight status on the giant screen. And, you’re not the only one looking at the status screen. But, does it really matters how many others are reading the information from the same screen? Now suppose the scenario is a little more realistic. Every airliner has a control with which they can update the screen with new information, while a passenger is reading the screen, the situation can get a little difficult. At any given instance, one or more airliner can be updating the screen at same instance, which could result in some garbled text to the passengers reading the screen. The two or airliners are actually in a race condition for a single resource, the screen.

If you’re using Cocoa, you must be familiar with the NSMutableArray. Also, you must be familiar that NSMutableArray isn’t thread-safe as of this writing. So, the question is how do make use of NSMutableArray in a situation like this, where we have concurrent read and write operations going on?

Let’s start with a simple single threaded model of what we’re doing:

/**
 * Concurreny Experiments: Shared data
 * Compile: clang SharedData.m -framework Foundation && ./a.out
 */

#import <Foundation/Foundation.h>

@interface Airport : NSObject {
    NSMutableArray *airportStatusBoard;
    NSUInteger writeCount;
    NSUInteger readCount;
}
@end

@implementation Airport

- (id)init
{
    self = [super init];
    if (!self) {
        return nil;
    }
    
    airportStatusBoard = [[NSMutableArray alloc] init];
    writeCount = 0;
    readCount = 0;
    
    return self;
}

- (void)dealloc
{
    [airportStatusBoard release];
    [super dealloc];
}

- (void)addFlightEntry
{
    NSNumber *newEntry = @(random() % 10);
    [airportStatusBoard addObject:newEntry];
}

- (void) statusWriter
{
    NSLog(@"writing begin: %@", @(writeCount));
    [self addFlightEntry];
    NSLog(@"writing ends: %@", @(writeCount));
    writeCount++;
}

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
    NSLog(@"reading begin: %@", @(readCount));
    
    BOOL found = NO;
    for (NSNumber *flightNum in airportStatusBoard) {
        if ([flightNum isEqualToNumber:searchFlightNum]) {
            found = YES;
            break;
        }
    }
    
    NSLog(@"reading ends: %@", @(readCount));
    
    readCount++;
    
    return found;
}

- (void) serialLoop
{
    srandom(time(0));
    while (![self statusReaderWithFlightNumber:@(7)]) {
        [self statusWriter];
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

@end

int main()
{
    @autoreleasepool {
        Airport *airport = [[Airport alloc] init];
        [airport serialLoop];
    }
    
    return 0;
}

Here’s the output for the above program on my machine:


2014-09-27 14:32:12.636 a.out[9601:507] reading begin: 0
2014-09-27 14:32:12.638 a.out[9601:507] reading ends: 0
2014-09-27 14:32:12.639 a.out[9601:507] writing begin: 0
2014-09-27 14:32:12.639 a.out[9601:507] writing ends: 0
2014-09-27 14:32:12.640 a.out[9601:507] reading begin: 1
2014-09-27 14:32:12.640 a.out[9601:507] reading ends: 1
2014-09-27 14:32:12.641 a.out[9601:507] writing begin: 1
2014-09-27 14:32:12.641 a.out[9601:507] writing ends: 1
2014-09-27 14:32:12.642 a.out[9601:507] reading begin: 2
2014-09-27 14:32:12.642 a.out[9601:507] reading ends: 2
2014-09-27 14:32:12.643 a.out[9601:507] Flight found after 2 writes and 3 reads

As you can observe, we’re performing sequential writes and reads until we find the required flight number in the array, and everything works out great. But, if we now try to run each write and read concurrently. To do that we create a queue for reading and writing. We perform all the reading and writing operation on a concurrent queue. We’re trying to implement the scenario where one passenger (you) is interested in reading the status board from top to bottom and check if their flight is listed. When they reach the end of the list and did not find the flight number they start again from the top. While, many airliner services are trying to update the single status board concurrently.

readWriteQueue = dispatch_queue_create("com.whackylabs.concurrency.readWriteQ", DISPATCH_QUEUE_CONCURRENT);

- (void)parallelLoop
{
    srandom(time(0));

    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(readWriteQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readWriteQueue, ^{
            if ([self statusReaderWithFlightNumber:@(SEARCH_FLIGHT)]) {
                done = YES;
            }
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

If you now try to execute this program, once in a while you might get the following error:


Terminating app due to uncaught exception 'NSGenericException', reason: '*** Collection <__NSArrayM: 0x7f8f88c0a5d0> was mutated while being enumerated.'

This is the tricky part about debugging race conditions, most of the time your application might to be working perfectly, unless you deploy it on some other hardware or you get unlucky, you won’t get the issues. One thing you can play around with is, deliberately sleeping a thread to cause race conditions with sprinkling thread sleep code around for debugging:

- (void) statusWriter
{
    [NSThread sleepForTimeInterval:0.1];
    writeCount++;
    NSLog(@"writing begin: %@", @(writeCount));        
    [self addFlightEntry];
    NSLog(@"writing ends: %@", @(writeCount));
}

If the above code doesn’t crashes on your machine on the first go, edit the sleep interval and/or give it a few more runs.

Now coming to the solution of such a race condition. The solution to such problem is in two parts. First is the write part. When multiple threads are trying to update the shared data, we need a way to lock the part where the actual update happens. It’s something like when multiple airliners are trying to update the single screen, we can figure out a situation like, only one airliner has the controls required to update the screen, and when it’s done, it releases the control to be acquired by any other airliner in queue.

Second, is the read part. We need to make sure that when we’re performing a read operation some other thread isn’t mutating the shared data as NSEnumerator isn’t thread-safe either.

A way to acquire such a lock with GCD is to use dispatch_barrier()

- (void) statusWriter
{
    dispatch_barrier_async(readWriteQueue, ^{
        
        [NSThread sleepForTimeInterval:0.1];
        
        NSLog(@"writing begin: %@", @(writeCount));
        
        [self addFlightEntry];
        
        NSLog(@"writing ends: %@", @(writeCount));
        writeCount++;
        
    });
}

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
   __block BOOL found = NO;
    
    dispatch_barrier_async(readWriteQueue, ^{
        
        NSLog(@"reading begin: %@", @(readCount));
        
        for (NSNumber *flightNum in airportStatusBoard) {
            if ([flightNum isEqualToNumber:searchFlightNum]) {
                found = YES;
                break;
            }
        }
        
        NSLog(@"reading ends: %@", @(readCount));
        
        readCount++;
        
    });
    
    return found;
}

If you try to run this code now, you’ll observe that the app doesn’t crash anymore, but the app has become significantly slower.

The first performance improvement that we can make is to not return from the read code immediately. Instead, we can let the queue handle the completion:

- (void) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum completionHandler:(void(^)(BOOL success))completion
{
    
    dispatch_barrier_async(readWriteQueue, ^{
        
        NSLog(@"reading begin: %@", @(readCount));
        
        BOOL found = NO;
        
        for (NSNumber *flightNum in airportStatusBoard) {
            if ([flightNum isEqualToNumber:searchFlightNum]) {
                found = YES;
                break;
            }
        }
        
        NSLog(@"reading ends: %@", @(readCount));
        
        readCount++;
       
        completion(found);
    });
}

- (void)parallelLoop
{
    srandom(time(0));
    
    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(readWriteQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readWriteQueue, ^{
            [self statusReaderWithFlightNumber:@(SEARCH_FLIGHT) completionHandler:^(BOOL success) {
                done = success;
            }];
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

Another thing to note is that, our code is no longer executing in parallel. As each dispatch_barrier() blocks the queue until all the pending tasks in the queue are complete. As is evident from the final log.

Flight found after 154 writes and 154 reads

In fact, the processing could be even worse than just running it sequentially, as the queue has to take care of the locking and unlocking overhead.

We can make the read operation as non-blocking again, as blocking the reads is not getting us any profit. One way to achieve that is to copy the latest data within a barrier and then use the copy to read the data while the other threads update the data.

- (BOOL) statusReaderWithFlightNumber:(NSNumber *)searchFlightNum
{
    
    NSLog(@"reading begin: %@", @(readCount));
    
    __block NSArray *airportStatusBoardCopy = nil;
    dispatch_barrier_async(readWriteQueue, ^{
        
        airportStatusBoardCopy = [airportStatusBoard copy];
        
    });
    
    
    BOOL found = NO;
    
    for (NSNumber *flightNum in airportStatusBoardCopy) {
        if ([flightNum isEqualToNumber:searchFlightNum]) {
            found = YES;
            break;
        }
    }

    if (airportStatusBoardCopy) {
        [airportStatusBoardCopy release];
    }
    
    NSLog(@"reading ends: %@", @(readCount));
    
    readCount++;
    
    return found;
}

But, this won’t solve the problem. Most of reads are probably a waste of time. What we actually need is a way to communicate between the write and read. The actual read should only happen after the data has been modified with some new write.

We can start by separating the read and write tasks. We actually need the read operations to be serial, as we are implementing the case where a passenger is reading the list top-down and when they reach at the end, they start reading from the beginning. Whereas, many airliners can simultaneously edit the screen, so they still need to be in a concurrent queue.

readQueue = dispatch_queue_create("com.whackylabs.concurrency.readQ", DISPATCH_QUEUE_SERIAL);
writeQueue = dispatch_queue_create("com.whackylabs.concurrency.writeQ", DISPATCH_QUEUE_CONCURRENT);

- (void)parallelLoop
{
    srandom(time(0));
    
    __block BOOL done = NO;
    while(!done) {
        
        /* perform a write op */
        dispatch_async(writeQueue, ^{
            [self statusWriter];
        });
        
        /* perform a read op */
        dispatch_async(readQueue, ^{
            [self statusReaderWithFlightNumber:@(SEARCH_FLIGHT) completionHandler:^(BOOL success) {
                done = success;
            }];
        });
    }
    
    NSLog(@"Flight found after %@ writes and %@ reads", @(writeCount), @(readCount));
}

Now, in the read operation we take a copy of the shared data. The way we can do it is by having an atomic property. Atomicity guarantees that the data is either have be successful or unsuccessful, there won’t be any in-betweens. So, in all cases we shall have a valid data all the time.

@property (atomic, copy) NSArray *airportStatusBoardCopy;

In each read operation, we simply copy the shared data, so don’t care about mutability anymore.

self.airportStatusBoardCopy = airportStatusBoard;

And further on we can use the copy to execute the read operations.

2014-09-27 16:20:33.145 a.out[12855:507] Flight found after 11 writes and 457571 reads

This is the queue level solution of the problem. In our next update we will take a look at how we can solve the problem at thread level using C++ standard thread library.
As always, the code for today’s experiment is available online at the github.com/chunkyguy/ConcurrencyExperiments.

Posted in concurrency, _dev | Tagged , , , | Comments Off

Component System using C++ Multiple Inheritance

Few days back, I talked about designing Component System using Objective-C and message forwarding. Today, to carry the conversation forward, we shall see how can the same Component System be designed with C++, and also why C++ is a better language choice for such a design.

Before I begin, let’s copy-paste the abstract from our earlier post, on why and when do we actually need a Component System:

I wanted a component system where I’ve a GameObject that can have one or more components plugged in. To illustrate the main problems with the traditional Actor based model lets assume we’re developing a platformer game. We have these three Actor types:

1. Ninja: A ninja is an actor that a user can control with external inputs.
2. Background: Background is something like a static image.
3. HiddenBonus: Hidden bonus is a invisible actor that a ninja can hit.

Source: Swim Ninja

Source: Swim Ninja

Let’s take a look at all the components inside these actors.

Ninja = RenderComponent + PhysicsComponent
Background = RenderComponent
HiddenBonus = PhysicsComponent.

Here RenderComponent is responsible for just drawing some content on the screen.
while the PhysicsComponent is responsible for just doing physics updates like, collision detection.

So, in a nutshell from our game loop we need to call things like

void Scene::Update(const int dt)
{
    ninja.Update(dt);
    hiddenBonus.Update(dt);
}
 
void Scene::Render()
{
    background.Render();
    ninja.Render();
}

Now, in the traditional Actor model, if we have an Actor class with both RenderComponent and PhysicsComponent like:

	
class Actor {
public:
    void Update(const int dt);
    void Render();
};

    Actor ninja;
    Actor hiddenBonus;
    Actor background;

Then it would inevitably add a PhysicsComponent to background actor and a RenderComponent to a hiddenBonus actor. So, the user of the code has to keep a mental track of what functions shouldn’t be invoked on what objects. Which is as awful in reality, as it sounds in theory.

We could use the same approach we used when designing the Component System with Objective-C, by having a function on fat GameObject that could be used to enable a particular component. Think of each GameObject as an collection of many components. Where each component know how to deal with one thing, like rendering, physics, AI, networking, and so on. When creating a new GameObject we need a method to specify what components should be enabled for it. The fat interface technique we’ve already discussed in that last experiment. Let’s try something new, we’re using C++ after all. Let’s start with breaking the components as separate classes:

class RenderComponent {
public:
    void Render();
};

class PhysicsComponent {
public:
    void Update(const int dt);
};

Then we can easily create composite components using such core components, such as:

class RenderAndPhysicsComponent : public RenderComponent, public PhysicsComponent
{};

Now, we can update our game scene’s interface as:

    RenderAndPhysicsComponent ninja;
    PhysicsComponent hiddenBonus;
    RenderComponent background;

This definitely serves our immediate needs, but there is some problem. This solution doesn’t scales well. Whenever we come up with a new component, we’ve to update your related base class accordingly and create a mix for each possible basic component interfaces. For five core components, we would get a total of C(5,1) + C(5,2) + C(5,3) + C(5,4) + c(5,5) = 5 + 10 + 10 + 5 + 1 = 31 total component interfaces combinations possible.

In C++, we can actually do better. This is because C++ has one thing that other languages just don’t want. It’s called multiple inheritance.

Instead of making the a composite collection class a direct collection of many components, we can instead have a single GameObject class that could inherit from as many components as desired. And, further to avoid having the fat abstraction issue, we can have the GameObject to be inherited at compile time from only the components that we actually need for that particular object.

How do we design such a GameObject? One correct answer is using variadic templates:

template <typename... Component>
class GameObject : public Component... {
public:
    GameObject() :
    Component()...
    {}
    
    GameObject(Component... component) :
    Component(component)...
    {}
};

This GameObject has nothing of its own. A host class that wishes to use a instance of this object has to supply a concrete class as a template parameter. This has given infinite flexibility to our GameObject.

We can now update our interface to easily use the core components to construct our GameObject instances that fit our needs:

    GameObject<RenderComponent, PhysicsComponent> ninja;
    GameObject<PhysicsComponent> hiddenBonus;
    GameObject<RenderComponent> background;

If that seems to verbose to write every time, we can in fact use typedefs to create some most common types.

    typedef GameObject<RenderComponent> Sprite;
    typedef GameObject<PhysicsComponent> Trigger;
    typedef GameObject<RenderComponent, PhysicsComponent> Actor;
    
    Actor ninja;
    Trigger hiddenBonus;
    Sprite background;

Now, I know that feels already too awesome, but we still have a problem. In practice we often need to communicate between components. Like say, we have a TransformComponent, that just provides us a model matrix based on the current transform of the GameObject.

class TransformComponent {
public:
    /** Get the model matrix of the transform to take things from model space to
     * the world space
     */
    GLKMatrix4 GetModelMatrix() const;
    
    GLKVector2 position;
    float rotation;
};

And, that model matrix then could be used by the RenderComponent for rendering the mesh.

class RenderComponent {
public:

    void Render(const Uniform &uniform,
                const Camera &camera) const;

    GLKMatrix4 modelMatrix;
    Mesh mesh;
};

void RenderComponent::Render(const Uniform &uniform,
                             const Camera &camera) const
{
    GLKMatrix4 modelViewProjectionMatrix = GLKMatrix4Multiply(camera.GetProjectionMatrix(),
                                                              GLKMatrix4Multiply(camera.GetViewMatrix(),
                                                                                 modelMatrix));
    glUniformMatrix4fv(uniform.um4k_Modelviewprojection, 1, 0, modelViewProjectionMatrix.m);
    mesh.Draw();
}

So, how do go with that design?

To solve that, we actually need another component. Let’s call it CustomComponent. The entire purpose of CustomComponent is to have an Update function:

class CustomComponent {
public:
    void Update(const int dt);
    
protected:
    std::function<void(const int dt)> customUpdate;
};

Now we can use our CustomComponent to bind many other components with something like:

void Ninja::Load()
{
	/* ... */
    customUpdate_ = std::bind(&Ninja::Update, this, std::placeholders::_1);
}
void Ninja::Update(const int dt)
{
	/* from transform component */
	GLKMatrix4 modelMatrix = GetModelMatrix();
	/* to render component */
    SetModelMatrix(modelMatrix);
}

No matter how absurd this code looks, it works. We can improve this, and while doing that we shall also address another important issue with multiple inheritance, name clashing.

Since, our GameObject is now split into multiple files we can easily get name clashing. For example both RenderComponent and TransformComponent can have a member named

GLKMatrix4 modelMatrix;

Or, both PhysicsComponent and CustomComponent can have a member function named:

void Update(const int dt);

How do we resolve this issue? One way is to adopt a naming convention, like say every PhysicsComponent will have a ‘phy’ prefix appended before every member function and data member. That would reduce our code to:

void Ninja::Update(const int dt)
{
	/* from transform component */
	GLKMatrix4 modelMatrix = transGetModelMatrix();
	/* to render component */
    rendrSetModelMatrix(modelMatrix);
}

We all agree that following a naming convention is very hard. So this solution might not work in long term. A better way out is to encapsulate all the core logic out of the component to another core class. The TransformComponent could be reduced to:

class TransformComponent {
public:
    /** Create default Transform component */
    TransformComponent();
    
protected:
    Transform transform_;
};

class Transform {
public:
    /** Create default Transform */
    Transform(const GLKVector2 &position = GLKVector2Make(0.0f, 0.0f),
              const Degree rotation = 0.0f);

    GLKVector2 GetPosition() const;
    
    void SetPosition(const GLKVector2 &position);
    
    /** Get angle in degrees */
    float GetRotation() const;
    
    /** Set rotation in degrees
     * @param rotation Angle in degrees.
     */
    void SetRotation(const Degree rotation);
    
    /** Get the model matrix of the transform to take things from model space to
     * the world space
     */
    GLKMatrix4 GetModelMatrix() const;
    
private:
    GLKVector2 position_;
    Degree rotation_;
};

With such pattern implemented, our same code would look like:

void Ninja::Update(const int dt)
{
    renderer_.SetModelMatrix(transform_.GetModelMatrix());
}

OK, with that taken care of, let’s now focus on the fitting the pieces together. Let’s begin with updating the commonly used GameObject configurations to proper C++ types:

class Actor : public GameObject <TransformComponent, PhysicsComponent, RenderComponent, CustomComponent> {
public:
    void Load(const Transform &trans, const Mesh &mesh);
    void Update(const int dt);
};

class Sprite : public GameObject<TransformComponent, RenderComponent, CustomComponent> {
public:
    void Load(const Transform &trans, const Mesh &mesh);
    void Update(const int dt);
};

class Trigger : public GameObject<TransformComponent, PhysicsComponent, CustomComponent> {
public:
    void Load(const Transform &trans);
    void Update(const int dt);
};

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    ninja_.Load(Transform(), meshMap_[1]);
    hiddenBonus_.Load(Transform());
}

Lets now implement the loop. The trickiest part of the loop is that we need to call Update() on every object that has a CustomComponent and Render() on objects that implement RenderComponent. Notice, so far we haven’t implemented any virtual functions, so can not depend on the runtime to select the correct function. What we actually need is something like:

void Scene::Update(const int dt)
{
    /* update all custom components */
}

void Scene::Render()
{
    glClear(GL_COLOR_BUFFER_BIT);
    
    /* bind all pieces to the GL pipeline */
    meshMem_.Enable();
    shader_.Enable();
    
    /* render all render components */
}

What we actually need some sort of container class that hold all types of Components. The first thought that hits the brain is to have a std::vector of Component of every kind. And from our experience we already know this sort of patterns doesn’t scales well. But using the power of C++11 and tuples we implement that in a much better way. Let’s add a ContainerComponent as:

template <typename... Component>
class ContainerComponent {
protected:
    std::tuple<std::vector<Component*>...> children_;
};

As you can see, ContainerComponent is capable of holding any number of components of any size. Think of it as a class that can possess compile time flexibility to hold any kind of components and at runtime we can add and remove instances of those component classes.

Now we can update our plain Scene class to be a GameObject that has a ContainerComponent. The ContainerComponent is capable of holding two kind of Components, the CustomComponent and the RenderComponent:

class Scene : public GameObject <
	ContainerComponent<
		CustomComponent,
		RenderComponent
>> {
	/* ... */
};

The only important thing to remember is that the order matters. Because this is a tuple we have to remember the index of Component type.

Our new updated Load() now looks like:

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    std::get<0>(children_).push_back(&background_);
    std::get<1>(children_).push_back(&background_);
    
    ninja_.Load(Transform(), meshMap_[1]);
    std::get<0>(children_).push_back(&ninja_);
    std::get<1>(children_).push_back(&ninja_);

    hiddenBonus_.Load(Transform());
    std::get<0>(children_).push_back(&hiddenBonus_);
}

The power of C++ compile time checks comes into play when we do something illegal, such as:

std::get<1>(children_).push_back(&hiddenBonus_);

The above expression will throw an error at compile time because the hiddenBonus_ does not has a RenderComponent and we’re trying to add it to a vector of RenderComponents.

So far so good. Let’s make the code more readable by removing the hard-coded integers. If we introduce a enum type as:

enum ContainerIndex {
    Custom = 0,
    Render = 1
};

We can do things like:

void Scene::Load(const GLKVector2 &screen)
{
    LoadShader();
    LoadMesh();
    
    /* load entities */
    background_.Load(Transform(), meshMap_[0]);
    std::get<ContainerIndex::Custom>(children_).push_back(&background_);
    std::get<ContainerIndex::Render>(children_).push_back(&background_);
    
    ninja_.Load(Transform(), meshMap_[1]);
    std::get<ContainerIndex::Custom>(children_).push_back(&ninja_);
    std::get<ContainerIndex::Render>(children_).push_back(&ninja_);

    hiddenBonus_.Load(Transform());
    std::get<ContainerIndex::Custom>(children_).push_back(&hiddenBonus_);
}

Now getting back to our loop. We can simply do things like:

void Scene::Update(const int dt)
{
    /* update all custom components */
    for (auto customComponent : std::get<ContainerIndex::Custom>(children_)) {
        customComponent->Update(dt);
    }
}

void Scene::Render()
{
    glClear(GL_COLOR_BUFFER_BIT);
    
    /* bind all pieces to the GL pipeline */
    meshMem_.Enable();
    shader_.Enable();
    
    /* render all render components */
    for (const auto renderComponent : std::get<ContainerIndex::Render>(children_)) {
        renderComponent->Render(uniform_, camera_);
    }
}

Here is the result of this prototype:

The yellow color is applied to the render-only background, while the cyan is the ‘render+physics’ ninja.

I haven’t implemented any physics elements, but in a real game with a Physics engine implemented the PhysicsComponent should look something like:

class PhysicsComponent {
public:
    PhysicsComponent();
    
protected:
    b2Body *physicsBody_;
};

class Physics {
public:
    bool Load(
              const GLKVector2 &screenSize,
              const GLKVector2 &gravity
              );
    
    b2World *GetWorld() const;
    
private:
    std::unique_ptr<b2World> physicsWorld_;    
};

And any GameObject that listens to physics updates should look something like:

void Ball::Update(const int dt)
{
    transform_.SetPosition(PhysicsConverter::GetPixels2(physicsBody_->GetPosition()));
    renderer_.SetModelMatrix(transform_.GetModelMatrix());
}

And then the typical Scene class’s Update() should look like:


bool GameScene::Load(const GLKVector2 &screen)
{
    /* enable physics */
    physics_.Load(screen, GLKVector2Make(0.0f, 0.0f));

	/* more loading ... */
}

void GameScene::Update(const int dt)
{
    /* update physics */
    physics_.GetWorld()->Step(1.0f/FPS, 10, 10);
    physics_.GetWorld()->ClearForces();
    
    /* update all custom components */
    for (auto customComponent : std::get<ContainerIndex::Custom>(children_)) {
        customComponent->Update(dt);
    }
    
    /* update camera */
    camera_.SetPan(ball_.GetTransform().GetPosition());
}

I hope this gives and idea of how can we use the enormous power of C++ to build an game engine based on component system. This also shows why C++ is such a great language for such a pattern when compared to something like Objective-C.

The entire code for this experiment is available at https://github.com/chunkyguy/CppComponentSystem. Please feel free to checkout and start playing around.

Posted in OpenGL, _dev, _games | Tagged , | Comments Off

Concurrency: Spawning Independent Tasks

First step in getting with concurrency is spawning tasks in background thread.

Lets say we’re building a chat application where a number of users chat in a common room. In this sort of scenario every user is probably chatting at the same instance. For sake of simplicity let’s say each user can only have a single character username. And obviously such a constraint is definitely going to outrage the users. So, pissed off as they are, they’re just printing their usernames over and over again.

To support such a functionality, we might need a function like such:

void PrintUsername(const char uname, const int count)
{
    for (int i = 0; i < count; ++i) {
        std::cout << uname << ' ';
    }
    std::cout << std::endl;
}

In a single threaded environment, whatever user invokes this method gets the control of the CPU and their username is going to get printed until the loop expires.

If there are two users in the chat room with usernames ‘X’ and ‘Y’, each wants to print their username 10 times. The calling code might look something like:


void SingleThreadRoom()
{
    PrintUsername('X', 10);
    PrintUsername('Y', 10);
}

The output is as expected:

X X X X X X X X X X 
Y Y Y Y Y Y Y Y Y Y 

Now, to make the app slightly better, let’s give both the users equal weights. Each user should be able to perform the same task in their own thread. Simplest way to spawn a new thread is by calling join()


void MultiThreadRoom()
{
    std::thread threadX(PrintUsername, 'X', 10);
    std::thread threadY(PrintUsername, 'Y', 10);
    
    threadX.join();
    threadY.join();
}

On my machine this outputs as:

YX  YX  YX  YX  YX  YX  YX  YX  YX  YX  

Calling join() on a thread attaches the thread to the calling thread. When the app launches we already get the main thread, and since we’re calling join() on the main thread, the two spawned threads get joined to the main thread.

If we need a thread for some task that is of kind fire-and-forget, we can call detach(). But, if we call detach() we’ve no control of the thread lifetime anymore. It’s something like the main thread doesn’t cares anymore.

For a program like:

void MultiThreadRoomDontCare()
{
    std::thread threadX(PrintUsername, 'X', 10);
    std::thread threadY(PrintUsername, 'Y', 10);
    
    threadX.detach();
    threadY.detach();
}

int main()
{
    MultiThreadRoomDontCare();
    std::cout << "MainThread: Goodbye" << std::endl;
    return 0;
}

The output on my machine is:

MainThread: Goodbye
XY  XY

As you can see the main thread doesn’t care if it’s child thread are still not finished.

Getting over to Swift, we don’t have to manually track the threads, but think in more higher terms as tasks and queues.

First problem we face when trying out async code in Playground is that the NSRunLoop isn’t running in Playground as it is with out native apps. The solution is to import the XCPlayground and call XCPSetExecutionShouldContinueIndefinitely() method. All thanks to the internet.

Here’s a test code that works:

import UIKit
import XCPlayground

XCPSetExecutionShouldContinueIndefinitely(continueIndefinitely: true)

NSOperationQueue.mainQueue().addOperation(NSBlockOperation(block: { () -> Void in
    println("Hello World!!")
}))

var done = "Done"

Since, we’re talking in terms of queues and tasks, we have to change the way of thinking. Now, we don’t care about threads anymore, only tasks. The task is to print the username. This task is going to be submitted to the queue and the queue internally shall manage the threads.

A code like:

func printUsername(uname:String)
{
        println(uname)
}

let chatQueue:NSOperationQueue = NSOperationQueue()

func multiThreadRoom()
{
    for (var i = 0; i < 5; ++i) {
        chatQueue.addOperation(NSBlockOperation(block: { () -> Void in
            printUsername("X")
        }))
        
        chatQueue.addOperation(NSBlockOperation(block: { () -> Void in
            printUsername("Y")
        }))
    }
}

multiThreadRoom()

println("Main Thread Quit")

Prints on my machine as:

X
Y
X
Y
X
Y
X
Y
X
Y
Main Thread Quit

This is the very basics of spawning concurrent independent tasks. If all your requirements are working on independent set of tasks, this is almost all you need to learn about concurrency to get the job done. I hope the real world were this simple ;)

As always, the entire code is also available online at: https://github.com/chunkyguy/ConcurrencyExperiments

Posted in concurrency, _dev | Tagged , , , | Comments Off

Concurrency: Getting started

The time everybody was speculating for so long has finally arrived. The processors aren’t getting any faster. The processors are hitting the physical thresholds. If we try to run the processors, as they are with current technology, they’re getting over-heated. Moore’s law is breaking down. Quoting Herb Sutter, ‘The free lunch is over‘.

If you want your code to run faster on new generation of hardware, you have to consider concurrency. All modern computing devices have multi-cores in built. More and more languages, toolkits, libraries, frameworks are providing support of concurrency. Let’s talk about concurrency.

Starting from today, I’ll be experimenting with multithreading. I’ll be using the following technologies. First is the C++11 thread library that ships with the C++11 standard. Second is the libdispatch or Grand Central Dispatch (GCD) that is developed by Apple. Although, GCD isn’t a pure threading library. It’s more of a concurrency library that is built over threads. In GCD we normally think in terms of tasks and queues. It’s a level above threads. And, the third is NSOperation. NSOperation is another level up from GCD. NSOperation is pure OOP oriented concurrency API. I’ve used it quite often with Objective-C and I’m sure it works as smoothly with Swift.

Let’s start with our first experiment. Dividing a huge task into smaller chunks and let them run in parallel to each other.

So, first question, what task should we pick. Let’s pick sorting of a dataset. Lets say, we’ve an huge array and we want to sort it. We could do it using the famous Quicksort algorithm as:

    template<typename T>
    void QuickSort(std::vector<T> &arr, const size_t left, const size_t right)
    {
        if (left >= right) {
            return;
        }
        
        /* move pivot to left */
        std::swap(arr[left], arr[(left+right)/2]);
        size_t last = left;
        
        /* swap all lesser elements */
        for (size_t i = left+1; i <= right; ++i) {
            if (arr[i] < arr[left]) {
                std::swap(arr[++last], arr[i]);
            }
        }
        
        /* move pivot back */
        std::swap(arr[left], arr[last]);

        /* sort subarrays */
        QuickSort(arr, left, last);
        QuickSort(arr, last+1, right);
    }

The algorithm is pretty straightforward. Given a list of unsorted data, in each iteration we pick a pivot element and sort that element by moving all smaller elements on the left side and all larger elements on right. By the end of the iteration, our pivot element is in the sorted position with we have two unsorted subarrays.

This algorithm is a prefect pick for concurrency. As in each iteration we get our problem divided into two independent sub problems that we can easily execute in parallel.

Here’s my first approach:

    template<typename T>
    void QuickerSort(std::vector<T> &arr, const size_t left, const size_t right)
    {
        if (left >= right) {
            return;
        }
        
        /* move pivot to left */
        std::swap(arr[left], arr[(left+right)/2]);
        size_t last = left;
        
        /* swap all lesser elements */
        for (size_t i = left+1; i <= right; ++i) {
            if (arr[i] < arr[left]) {
                std::swap(arr[++last], arr[i]);
            }
        }
        
        /* move pivot back */
        std::swap(arr[left], arr[last]);
        
        /* sort subarrays in parallel */
        auto taskLeft = std::async(QuickerSort<T>, std::ref(arr), left, last);
        auto taskRight = std::async(QuickerSort<T>, std::ref(arr), last+1, right);
    }

And here’s my calling code:

template <typename T>
void UseQuickSort(std::vector<T> arr)
{
    clock_t start, end;
    start = clock();
    QuickSort(arr, 0, arr.size() - 1);
    end = clock();
    std::cout << "QuickSort: " << (end - start) * 1000.0/double(CLOCKS_PER_SEC) << "ms" << std::endl;
    assert(IsSorted(arr));
}

template <typename T>
void UseQuickerSort(std::vector<T> arr)
{
    clock_t start, end;
    start = clock();
    auto bgTask = std::async(QuickerSort<T>, std::ref(arr), 0, arr.size()-1);
    if (bgTask.wait_for(std::chrono::seconds(0)) != std::future_status::deferred) {
        while (bgTask.wait_for(std::chrono::seconds(0)) != std::future_status::ready) {
            std::this_thread::yield();
        }
        assert(IsSorted(arr));
        end = clock();
        std::cout << "QuickerSort: " << (end - start) * 1000.0/double(CLOCKS_PER_SEC) << "ms" << std::endl;
    }
}

void main()
{
    
    std::vector<int> arr;
    for (int i = 0; i < 100; ++i) {
        arr.push_back(rand()%10000);
    }

    UseQuickSort(arr);
    UseQuickerSort(arr);
}

I’ve added some code to profile how long does each of these tasks take to execute the data set. The dataset of 100 elements executes on my machine as:

QuickSort: 0.023ms
QuickerSort: 29.12ms

As you can see, the concurrent code takes a lot longer to execute than the single threaded one. One guess could be that the creating thread for each subproblem and them the extra load of context switching between the threads could be the reason behind the latency. Or you could also suggest that there are huge flaws in my implementation.

I accept, there could be huge flaws in my implementation, as this is my first take at core multi-threading with C++ thread library. But, still we can’t ignore the fact that we’re spawning a lot of threads with each iteration. The number of threads running concurrent at each iteration are getting added by a factor of 2. This implementation definitely needs to be improved.

This is where GCD shines. First of all, it’s easier to begin with. Keep in mind I’ve equal amount of experience with both the libraries. So, my code should be full of bugs and most probably I’m not using them in the best possible way. But, this is all part of the test and my experiments.

Anyhow, when I implement the same problem using Swift and GCD, like so:


import UIKit

// MARK: Common

func isSorted(array:[Int]) -> Bool
{
    for (var i = 1; i < array.count; ++i) {
        if (array[i] < array[i-1]) {
            println("Fail case: \(array[i]) > \(array[i-1])")
            return false
        }
    }
    return true
}

//MARK: Quick Sort

func quickSort(inout array:[Int], left:Int, right:Int)
{
    if (left >= right) {
        return;
    }
    
    /* swap pivot to front */
    swap(&array[left], &array[(left+right)/2])
    var last:Int = left;
    
    /* swap lesser to front */
    for (var i = left+1; i <= right; ++i) {
        if (array[i] < array[left]) {
            swap(&array[i], &array[++last])
        }
    }
    
    /* swap pivot back */
    swap(&array[left], &array[last])
    
    /* sort subarrays */
    quickSort(&array, left, last)
    quickSort(&array, last+1, right)
}

func useQuickSort(var array:[Int])
{
    let start:clock_t = clock();
    quickSort(&array, 0, array.count-1)
    let end:clock_t = clock();
    
    println("quickSort: \((end-start)*1000/clock_t(CLOCKS_PER_SEC)) ms")
    assert(isSorted(array), "quickSort: Array not sorted");
}

//MARK: Quicker Sort

let sortQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_BACKGROUND, 0)


func quickerSort(inout array:[Int], left:Int, right:Int)
{
    if (left >= right) {
        return;
    }
    
    /* swap pivot to front */
    swap(&array[left], &array[(left+right)/2])
    var last:Int = left;
    
    /* swap lesser to front */
    for (var i = left+1; i <= right; ++i) {
        if (array[i] < array[left]) {
            swap(&array[i], &array[++last])
        }
    }
    
    /* swap pivot back */
    swap(&array[left], &array[last])
    
    /* sort subarrays */
    dispatch_sync(sortQueue, { () -> Void in
        quickSort(&array, left, last)
    })
    dispatch_sync(sortQueue, { () -> Void in
        quickSort(&array, last+1, right)
    })
}

func useQuickerSort(var array:[Int])
{

    let start:clock_t = clock();

    dispatch_sync(sortQueue, { () -> Void in
        quickerSort(&array, 0, array.count-1)

        let end:clock_t = clock();
        println("quickerSort: \((end-start)*1000/clock_t(CLOCKS_PER_SEC)) ms")
        
        assert(isSorted(array), "quickerSort: Array not sorted");

    })
}

// MARK: Main

var arr:[Int] = []
for (var i = 0; i < 20; ++i) {
    arr.append(Int(rand())%100);
}


useQuickSort(arr)
useQuickerSort(arr)

I get the following result:

quickSort: 784 ms
quickerSort: 772 ms

You obviously tell that the same implementation in Swift is quite slower than in C++, but that’s not the point. We’re not experimenting C++ vs Swift, but single thread vs multiple threads. And using GCD, we see quite a small improvement, even though our test case is so small. This is most probably because we’re not manually handling the thread management ourselves anymore. We simply let the dispatch_queue handle the thread management.

The sunshine of todays experiment is that if we make good use of multi threading, we have a lot of potential for getting a large improvements. And, this is more than enough to motivate me to dive further down the concurrency rabbit hole.

The source code for this experiment is available at the GitHub repository: https://github.com/chunkyguy/ConcurrencyExperiments

Posted in concurrency, _dev | Tagged , , , | Comments Off

The Art of Designing Softwares

When I was in college, my vision of a software was like, we have a problem and we need a solution, a program, a set of instructions, that has to be executed by the computer in order to solve that particular problem. For example, design a program to print a Fibonacci series using loops, recursion, implement with an array, a doubly linked list, an AVL tree.

Then there are these programming challenges like the Facebook Hacker Cup, or the Google CodeJam, that just focus on how strong are you at understanding and implementing common algorithms and data structures, or can you design your own data structures and algorithms?

And, there is nothing wrong with that, but just that writing softwares isn’t just about designing algorithms or data structures, as one eventually discovers.

When I got my first job as an iOS developer, and start writing softwares at the enterprise levels. First thing I realized was that, the solution space is no longer just a set of instructions in a single file, but it is split into hundreds if not thousands of files. I realized that writing softwares is no longer just about algorithms and data structures. In fact, almost all of the interesting algorithms and data structures are either already solved and ready to use for your existing code, or nobody really cares about them as the first priority. There is a high probability that the programming languages of your choice is already bundled with all common data structures and algorithms you’ll ever need.

As a software engineer, most of the time I was working on managing the flow of data and control. The actual work seemed more close to plumbing, where instead of fluids one manages the flow of electrons through the hardware. Initially I was somewhat shocked to see that the iOS framework provides just three data structures, NSArray, NSDictionary and NSSet*. Thats it!

* (maybe also NSMapTable, NSHashTable, NSCountedSet, NSPointerArray,… but let’s assume we haven’t heard of them)

What is software engineering?

For me software engineering is more about understanding the problem at a higher level. It is half part how good you’re at designing the flow yourself and half part, how good are you at reusing the existing solutions written by others.

Part 1: The design space
The design space deals with problems like:

  • How do I want the UI to be?
  • How do I manage downloading of multiple files in background even after my application has been quit?
  • How do I keep the memory used by the application in control? What happens, when I actually run out of memory?
  • How do I confirm that the code is just wrote, is going to pass all edge cases at all times?

Part 2: The integration space
The integration space deals with problems like:

  • How do I use the code I had written some time back to solve a similar problem?
  • How do I design this code, so that it could be used next time I face a similar problem?
  • Has someone already tackled a problem similar to mine? If yes, how can I use their solution?

Of these two subspaces, the design space seems easier, because you’re taught this at college level and you’re in total control of the design. The integration space is hard, and by hard I mean really hard. At a first few attempts, you might seriously consider skipping the integration and try moving it to the comfortable design space, i.e., reinvent the wheel.

But, the reality is that you can’t always design your own solutions. Maybe you’re working with an organization that already has codebase of solutions that has to be used; Or your other teammate is solving the problem in parallel; Or maybe the solution is already written out years ago and has been used and maintained for a really long time. Not to mention the time constraints with the design space.

With my years of development, I’ve come to realization that almost all the problems are already solved or there are people already working on it. If you think, you are the only one working on it, you’re most definitely wrong. (If not, please add me to your project).

I bigger challenge then is, how to use the solution written by others, or by you some time ago? The main problem is that the problem could have been solved in some other context, or maybe it was not thought out enough. There could also be some flaws in the basic design, like the existing solution is not thread safe.

At the even higher level, a software can be written in two ways. Bottom-up or Top-down.

Bottom-up approach
Think of a problem where you have to design a car, in software that is. One of the ways you can design a car is, break down it into smaller tasks.

  • Design the tire.
  • Design the body.
  • Design the engine.

Here each step is recursive in itself, as the designing the engine would be again broken down further until you reach an atomic task that can not be broken down any further. You start working on those pieces and at then assemble them all back into places, and you have a car.

This solution has many benefits like, you can easily assign tasks between team members, and can almost work independently on them. Also, it is easier to design for the future this way, as when you’re working on designing the tires, you’re just focused on designing the tires that should fit all known use cases.

The main problem with bottom-up approach is, you need to be a good future-seeker. Whatever solutions you’re designing, should fall into place perfectly. Say on the final assembly day you realize the sockets on the body are not big enough to hold the tires. Also, as one can tell, bottom-up approach is more time consuming, because you’re designing for the unknown future. Imagine, your car has to perform well on all sort of geographical terrains. Most of the time you might even end up thinking too forward and design for the year 2056, while your solution would become obsolete in next 3 months. Try talking to all those who wrote great solutions with NSThread before Apple rolled GCD. And don’t forget the time constraints, your boss, teammates, consumers are always pressurizing you to release it already.

Top down approach
Think of the same problem of designing the car, but this time you’re working top down. You don’t actually care about the year 2056, but just this day. You start with a giant piece of metal and a empty notepad.

Then you start carving the metal block into the car. When you get to carving out the left tire, you might realize you’ve already done something similar. Then, you realize, yes sometime back you designed the right tire and this feels almost similar. So, you just apply the same procedure again and as your car comes to finish. With top-down approach often you might feel like the amount of work getting reduced with each iteration.

The problems with top-down approach are that, it’s really hard to work with a big team, because you’re blank at the beginning, so you can not assign tasks. Most of the time you need to start alone, and bring in more teammates as you progress.

The main differences between the two approaches are:

  • Top-down feels more like looking into the past while bottom-up is more into looking into the future.
  • Top-down solves the same problem in lesser time than bottom-up.
  • Top-down is not as future safe as bottom-up. Imagine a request to add a radio in the car. With bottom-up an engineer might have already thought about something similar and kept a ‘future’ slot, but with top-down you need to carve a hole and move internal stuff until you just get the radio working.
  • At the end of the day, top-down seems to taking you more closer to the end product, while bottom-up still seems uncertain until the final assembly.

What approach you pick actually depends on the task you’re solving. If you designing a framework or a library that has to be used by others or by you in the future, and have a lot of time at your hands, go for the bottom-up.

If you’re designing the end product, where time is limited. Maybe your product is eventually going to end up in garbage in 2 years, go with top down.

Another approach that can be tried is the hybrid approach, where you break the original task into smaller chunks using the bottom-up approach. Assign each sub problem to each teammate and then work on each sub problem with top-down approach in parallel until the final assembly.

 

Posted in _dev, _general_things | Tagged | Comments Off