Generic Programming: The Concept

Generic Programming: The Concept

Say, we have a type Animal which is at this time just an idea, so a protocol.

protocol Animal {
    func say()
}

Next, say we want a generic function that works with Animal.

func greet<T: Animal>(animal: T) {
    animal.say()
}

Looks good. Now let’s create our concrete types.

class Mammal: Animal {
    func say() {
        println("Hello! I'm a mammal!")
    }
}

class Reptile: Animal {
    func say() {
        println("Hello! I'm a replite")
    }
}

And bring it to use

let m = Mammal()
let r = Reptile()

let animals: [Animal] = [m, r]

for animal in animals {
    greet(animal)
}

Oh o! Unfortunately this code won’t compile. This is the error the compiler might throw on your machine.

error: generic parameter 'T' cannot be bound to non-@objc protocol type 'Animal'

Let’s rewrite our implementation in C++ and see what sort of error we get.

class Animal {
public:
    virtual void Say() const = 0;
};

template <typename T>
void Greet(const T &amp;t) {
    t.Say();
}

class Mammal: public Animal {
public:
    void Say() const {
        std::cout << "Hello! I'm a mammal." << std::endl;
    }
};

class Reptile: public Animal {
public:
    void Say() const {
        std::cout << "Hello! I'm a reptile." << std::endl;
    }
};

int main() {
    Mammal m;
    Reptile r;

    std::vector<Animal> animals = {m, r};

    std::for_each(animals.begin(), animals.end(), [](const Animal &amp;animal) {
        animal.Say();
    });

    return 0;
}

If you compile this with your C++11 compiler clang++ -std=c++11 -stdlib=libc++ code.cpp, you might get an error such as

error: allocating an object of abstract class type 'Animal'

But, we are never allocating the instance of type Animal right? This is time to get our understanding of Generic Programming better.

Generic Programming works around these main terminologies: The concept, model, refinement and type constraint.

Concept

Concept is the core design. Model is what conforms to the concept, the implementation of Concept. Refinement is like inheriting a Concept and adding more things to it. And Type Constraint deals with specifying what and what not a Type is expected to do

In C++, we can have a pure virtual base class, where the base class provides no implementation for any member function. This idea is implemented with a protocol in Swift.
Along with that we have something in Swift that C++ programmers are not used to. It’s the use of protocol for type constraints. With C++, the compiler implicitly adds the type constraints for us.

For example, in the following example,

protocol Equatable {
    func ==(lhs: Self, rhs: Self) -> Bool
}

struct Student {
    let studentId: Int
}

func ==(lhs: Student, rhs: Student) -> Bool {
    return lhs.studentId == rhs.studentId
}

extension Student: Equatable {}

the Equatable protocol is an explicit type constraint. But since we’re using the term protocol with Equatable, it could be a source of confusion to many.
The point to remember when working with Swift is that in swift protocol is used both for concept and type constraint. Whereas in C++ we don’t have any way to design type constraint.

Let’s explore this theory with an example of a generic stack implementation:

class Stack<T> {
    private var store: [T] = []

    func push(item: T) {
        store.append(item)
    }

    func pop() -> T? {
        if store.isEmpty {
            return nil
        }
        return store.removeLast()
    }
}

What could be the core concept of a stack? If we take a look at it, we might say a stack is a data structure where you push items and later on pop them back. So, we can say the concept of stack can be written as

protocol Stack {
    typealias Item
    func push(item: Item)
    func pop() -> Item?
}

Now, we can model an Array based stack on this concept as:

class ArrayStack<T>: Stack {
    private var store: [T] = []

    func push(item: T) {
        store.append(item)
    }

    func pop() -> T? {
        if store.isEmpty {
            return nil
        }
        return store.removeLast()
    }
}

Or, we can go crazy and write a weird stack, where you pop data filtered by some epoch timestamp that we provide at runtime.

class TimeStack<T>: Stack {

    private var store: [CFTimeInterval: T] = [:]
    var epoch = CACurrentMediaTime()

    func push(item: T) {
        store[CACurrentMediaTime()] = item
    }

    func pop() -> T? {
        for (time, value) in store {
            if time < epoch {
                return value
            }
        }
        return nil
    }
}

You get the idea. A concept is just the bare minimum abstraction of the idea that has to modeled by some other implementation for use.

Now, coming back to the original problem. What was wrong with our Animal example at the top?

The problem is with both the implementations. In the Swift implementation, we’ve designed our Animal as a type constraint and with our c++ implementation, as our dear clang compiler was trying to inform us, the Animal is actually a concept. And hence both the usages are invalid.

That’s all for today. We shall explore the other Generic Programming terminologies later someday.
Have a nice day!

Posted in dev, generic programming | Tagged , | Comments Off on Generic Programming: The Concept

Changing the gears to Generic Programming

Practical usage of generic programming can be narrowed down to two forms:

1. Writing top level generic functions
2. Writing generic custom types.

Generic Functions

Writing top level generic functions is easier. Its more of like you write a normal function and later on you realize that this function is not restricted to any type, rather it is just an abstract algorithm that can be used by many types, you upgrade it to generic type. For example,

bool isEqual(int x, int y) {
    return x == y;
}

can be easily abstracted out to a generic form such as

template <typename T>
bool isEqual(T x, T y) {
    return x == y;
}

This generic form would work as long as T supports a == operation. Which brings us to the term you will often find with generic programming, type constraints.

So, the above example would work fine with all types that support == operation. Such as:

    std::cout << (isEqual(10, 10) ? "Y" : "N") << std::endl;
    std::cout << (isEqual(10.5, 10.5) ? "Y" : "N") << std::endl;
    std::cout << (isEqual("hello", "hello") ? "Y" : "N") << std::endl;

But would easily break down when used like:

struct Vector2 {
    float x, y;
    Vector2(float xx, float yy) : x(xx), y(yy) {}
};

std::cout << (isEqual(Vector2(10, 10), Vector2(10, 10)) ? "Y" : "N") << std::endl; // output compilation failure

And the reason being, that the type constraint has been violated. The type Vector2 does not provides an == operation. The fix is simple, satisfy the constraints

bool operator == (const Vector2 &lhs, const Vector2 &rhs) {
    return (lhs.x == rhs.x) && (lhs.y == rhs.y);
}

Another type of error that could arise with generic functions is, where the constraints are not violated syntactically, by are logically broken. For example:

    const char w1[] = "world";
    const char w2[] = "world";
    std::cout << (isEqual(w1, w2) ? "Y" : "N") << std::endl; // output N

Here, our isEqual() sees that the types are const char * and the compiler knows how to compare two pointers. But the result might or might not be what we were expecting. If you were expecting a string comparison, you’ve to tell the compiler how to do that.

For our specific case, it can be achieved by providing an overloaded implementation that just takes const char *

bool isEqual(const char *x, const char *y) {
    return strcmp(x, y) == 0;
}

 

Generic Types

The main power of generic programming lies when working with generic types. Just as generic functions abstract the algorithm to a type-free level, generic types abstract our design to type free level.

Think of generic types as if us providing the type information for the compiler, and passing the responsibility of creating the actual types to the compiler.

To get an quick overview, the Vector2 type that we wrote above for float types can easily be rewritten as a generic type

template <typename T>
struct Vector2 {
    T x, y;
    Vector2(T xx, T yy) : x(xx), y(yy) {}
};

And out == implementation should just fit with the change:

template <typename T>
bool operator == (const Vector2<T> &lhs, const Vector2<T> &rhs) {
    return (lhs.x == rhs.x) && (lhs.y == rhs.y);
}

Now, where should we actually use the generic types in real world? To address this question, you’ve to switch you mindset from run-time polymorphism to compile-time polymorphism. Let’s consider an example where you have designed your types in the classical object-oriented fashion, that is using inheritence

class Shape {
public:
    virtual ~Shape() {}

    void PrintArea() {
        std::cout << "area = " << Area() << std::endl;
    }

private:
    virtual float Area() const = 0;
};

class Rectangle: public Shape {
public:
    Rectangle(float w, float h) : width(w), height(h) {}

private:
    float Area() const {
        return width * height;
    }

    float width, height;
};


class Circle: public Shape {
public:
    Circle(float r) : radius(r) {}

private:
    float Area() const {
        return 3.14 * (radius * radius);
    }

    float radius;
};

int main() {
    Rectangle *r = new Rectangle(10, 20);
    Circle *c = new Circle(5);

    std::vector<Shape *> shapes;
    shapes.push_back(r);
    shapes.push_back(c);

    std::for_each(shapes.begin(), shapes.end(), [](Shape *s) {
        s->PrintArea();
    });

    delete r;
    delete c;

    return 0;
}

Here, we have a base Shape class that does provides the public interface and leaves the actual implementation details to the subclass, in this case the actual calculation of the area. Internally, we’re calling the Area() function on Shape and let the run-time do the actual look-up for the implementation by looking at the type information.

In another words, what we are actually doing is that delegating some part of the implementation of our Shape class to subclasses. Another way of achieving the same results is by not depending on subclasses for overrides, but delegating the implementation to some external classes.

struct ShapeImpl {
    virtual float Area() const = 0;
};

class RectangleImpl: public ShapeImpl {
public:
    RectangleImpl(float w, float h) : width(w), height(h) {}

private:
    float Area() const {
        return width * height;
    }

    float width, height;
};

class CircleImpl: public ShapeImpl {
public:
    CircleImpl(float r) : radius(r) {}

private:
    float Area() const {
        return 3.14 * (radius * radius);
    }

    float radius;
};

class Shape {
public:
    Shape(const ShapeImpl *impl) : implementation(impl) {}

    void PrintArea() const {
        std::cout << "area = " << implementation->Area() << std::endl;
    }

private:
    const ShapeImpl *implementation;
};

int main() {
    RectangleImpl *r = new RectangleImpl(10, 20);
    CircleImpl *c = new CircleImpl(5);

    std::vector<Shape> shapes;
    shapes.push_back(Shape(r));
    shapes.push_back(Shape(c));

    std::for_each(shapes.begin(), shapes.end(), [](const Shape &s) {
        s.PrintArea();
    });

    delete r;
    delete c;

    return 0;
}

I don’t know if we’ve achieved any actual improvement over the earlier implementation with this or not. But, this gives you an idea that the implementation can be dragged out of the core class while keeping the visible interface of the Shape intact.

If we can achieve this, we can even go a step further and actually provide the implementation at compile time. In other words, we can simply make the Shape class generic, and let the compiler plug-in the implementation details at compile-time.

template <typename Impl>
class Shape {
public:
    Shape(const Impl impl) : implementation(impl) {}

    void PrintArea() const {
        std::cout << "area = " << implementation.Area() << std::endl;
    }

private:
    const Impl implementation;
};

Now, we can create and use our generic shape instances as:

    Shape<RectangleImpl> rect(RectangleImpl(10, 20));
    Shape<CircleImpl> circle(CircleImpl(5));

    rect.PrintArea();
    circle.PrintArea();

The only type constraints on the RectangleImpl and CircleImpl is that they need to have a Area() function.

class RectangleImpl {
public:
    RectangleImpl(float w, float h) : width(w), height(h) {}

    float Area() const {
        return width * height;
    }

private:
    float width, height;
};

class CircleImpl {
public:
    CircleImpl(float r) : radius(r) {}

    float Area() const {
        return 3.14 * (radius * radius);
    }

private:
    float radius;
};

The c++ compiler is smart enough to see the constraints are not violated by any new delegate class. For example if we introduce a TriangleImpl as:

class TriangleImpl {
public:
    TriangleImpl(float b, float h) : base(b), height(h) {}

private:
    float base, height;
};

   Shape<TriangleImpl> triangle(TriangleImpl(10, 5));
   triangle.PrintArea();

The compiler will immediately throw error messages, unless you fix your implementation by providing the Area() function.

class TriangleImpl {
public:
    TriangleImpl(float b, float h) : base(b), height(h) {}

    float Area() const {
        return 0.5 * base * height;
    }

private:
    float base, height;
};

Now, if you notice that we are not using an std::vector anymore, that is because we don’t have a Shape type anymore. And if you find yourself thinking in this direction, you need to get out of the run-time polymorphic mindset. When using compile-time polymorphism you’ve to keep in mind the fact that now you’re not in control of creating the custom types, rather you’ve passed on that authority to the compiler.

So, Shape<TriangleImpl> and Shape<CircleImpl> are entirely different types with no common ancestor or anything.

When designing with compile time polymorphism you need to keep in mind that your generic type should be a leaf node or a final class in the entire type system. You should not have to subclass your generic type, rather plug-in a delegate class that provides the implementation that you would rather provide by sub-classing.

To summarize, here’s a quick list of good and bad of generic type system:

Pro
1. Robust: No run time exceptions.
2. Efficient: No run time lookups

Cons
1. Not a good candidate for base class
2. Hard to read and debug.

Now, moving on to using generics with Swift. Swift provides the same good old c++ way of writing generic code with some extra type constraint system. Where in c++ the compiler would do the type constraint checking when you actually compile the code, Swift actually makes the constraint system more explicit, such that you have to provide the constraint information with a protocol.

protocol ShapeImplementable {
    var area: Double { get }
}

struct Shape<Impl: ShapeImplementable> {
    private let implementation: ShapeImplementable

    init(impl: ShapeImplementable) {
        implementation = impl
    }

    func printArea() {
        println("area = \(implementation.area)")
    }
}

And then you can implement your delegate class as

struct RectangleImpl: ShapeImplementable {
    var area: Double {
        return width * height
    }

    init(w: Double, h: Double) {
        width = w
        height = h
    }

    private let width: Double
    private let height: Double
}

And finally use them the same way you would with c++.

let rect = Shape<RectangleImpl>(impl: RectangleImpl(w: 10, h: 20))

The entire code for this article is also available at https://github.com/chunkyguy/GenericsDemo

Goodbye and have a nice day!

Posted in design patterns, dev, generic programming | Tagged , , , | Comments Off on Changing the gears to Generic Programming

Subclassing UILabel in 2015

UILabel is one the most smart view in UIKit family. It knows a lot about itself. If you constraint it to a certain width, the UILabel can calculate the height for itself. Another smart thing about UILabel is that is saves you machine cycles by not redrawing its content unless something is actually modified. And if you use NSAttributedString, you can in fact draw a more sophisticated text content.

Given so many good things with a UILabel, it is very tempting to subclass a UILabel to render our custom text content. Say if you’re working on a game or app, say a chat messenger, you may like to have a view where you render you text and have some decorations around it, like a speech bubble with an optional image. And, if you’ve ever worked on such a task, you know how painful it actually is!.

A UILabel is the perfect fit whenever you’ve a need to render some text on screen. Recently had to undergo such a job of subclassing UILabel, and it surprisingly hard to find a good source of information that shows how to properly subclass UILabel. It might work for the text data you have right now, but there are so many screen sizes and edge cases possible. If you’re one of those person, I don’t want you to discover this knowledge the hard way like I did, I’ll share the knowledge I’ve acquired from my adventure.

Even if you’re not directly using UILabel but rendering your content on an OpenGL canvas using Freetype library or something, the knowledge of how UILabel works should be really helpful in designing your own view. But, you can take my word for it, that rendering text is one of the most challenging things in the UI problem domains. You have to render all sort of glyphs, in all sort of writing directions with a lot of kerning, ligatures and other text attributes. If you’re one of those adventure seeker, you should definitely try building your own text rendering system. Just for the inspiration, these are from the days when I was building one:

Someday, I should probably also share my experiences with building a UILabel-like view from scratch, I mean on top of freetype. But, today we just extend over the work UILabel already does for us. Let’s begin with creating a new project.

We simply create a new project with a single view and add a RoundedRectLabel class. Next, we write our basic code to render the label on screen.

class RoundedRectLabel: UILabel {
}

class ViewController: UIViewController {
    override func viewDidLoad() {
        super.viewDidLoad()
        
        view.backgroundColor = .lightGrayColor()
        
        let chatBubbleLabel = RoundedRectLabel()
        chatBubbleLabel.text = "No one expects the Spanish Inquisition! Our chief weapon is surprise. Fear and surprise. Two chief weapons, fear, surprise, and ruthless efficiency! Er, among our chief weapons are: fear, surprise, ruthless efficiency, and near fanatical devotion to the Pope! Um, I'll come in again."
        chatBubbleLabel.textColor = .blackColor()
        chatBubbleLabel.backgroundColor = .cyanColor()
        chatBubbleLabel.numberOfLines = 0
        chatBubbleLabel.textAlignment = .Justified
        view.addSubview(chatBubbleLabel)
        
        chatBubbleLabel.setTranslatesAutoresizingMaskIntoConstraints(false)
        view.addConstraint(NSLayoutConstraint(item: chatBubbleLabel, attribute: .CenterY, relatedBy: .Equal, toItem: view, attribute: .CenterY, multiplier: 1.0, constant: 0.0))
        view.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[chatBubbleLabel]-|", options: NSLayoutFormatOptions(0), metrics: nil, views: ["chatBubbleLabel": chatBubbleLabel]))
    }
    
    override func prefersStatusBarHidden() -> Bool {
        return true
    }
}

Now the two important methods to subclass within a UILabel are:

    func textRectForBounds(bounds: CGRect, limitedToNumberOfLines numberOfLines: Int) -> CGRect
    func drawTextInRect(rect: CGRect)

The textRectForBounds(_:limitedToNumberOfLines:) provides us an opportunity to update the drawing area of the text within the UILabel. And the drawTextInRect() is for you to do you custom drawing within the view.

If you notice, this is a little different than your usual UIView subclassing, where you typically draw you content in drawRect(). Another thing to keep in mind is that you typically also don’t want to modify the intrinsicContentSize(). Overriding these two methods will do whatever you want to do with you custom UILabel.

For sake of getting a deeper understanding, let’s print out the order of execution.

class RoundedRectLabel: UILabel {
    
    override func textRectForBounds(bounds: CGRect, limitedToNumberOfLines numberOfLines: Int) -> CGRect {
        let textRect = super.textRectForBounds(bounds, limitedToNumberOfLines: numberOfLines)
        println("\(__FUNCTION__): \(textRect)")
        return textRect
    }
    
    override func drawTextInRect(rect: CGRect) {
        super.drawTextInRect(rect)
        println("\(__FUNCTION__): \(rect)")
    }

    override func intrinsicContentSize() -> CGSize {
        let size = super.intrinsicContentSize()
        println("\(__FUNCTION__): \(size)")
        return size
    }
}

Here’s the output:

textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 2122.5, 20.5)
intrinsicContentSize(): (2122.5, 20.5)
textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 342.5, 142.0)
intrinsicContentSize(): (342.5, 142.0)
drawTextInRect: (0.0, 0.0, 343.0, 142.0)

The interesting thing to notice here is that whatever is calculated from the super.textRectForBounds(_:limitedToNumberOfLines:) is also passed on to the intrinsicContentSize(). Another interesting thing is that, the methods are called multiple times. First time with some estimated values by UIKit and second time after the values have been evaluated accurately enough.

Not implementing the intrinsicContentSize() also means that, you don’t need to invalidateIntrinsicContentSize() yourself. Since, the default drawing mode of UILabel is UIViewContentMode.Redraw, the content is redrawn automatically whenever the content updates. To confirm, if you update the text after a while, you can see the drawing methods being invoked again.

class ViewController: UIViewController {
    override func viewDidLoad() {
        super.viewDidLoad()
        
        view.backgroundColor = .lightGrayColor()
        
        let chatBubbleLabel = RoundedRectLabel()
        chatBubbleLabel.text = "No one expects the Spanish Inquisition! Our chief weapon is surprise. Fear and surprise. Two chief weapons, fear, surprise, and ruthless efficiency! Er, among our chief weapons are: fear, surprise, ruthless efficiency, and near fanatical devotion to the Pope! Um, I'll come in again."
        chatBubbleLabel.textColor = .blackColor()
        chatBubbleLabel.backgroundColor = .cyanColor()
        chatBubbleLabel.numberOfLines = 0
        chatBubbleLabel.textAlignment = .Justified
        view.addSubview(chatBubbleLabel)
        
        chatBubbleLabel.setTranslatesAutoresizingMaskIntoConstraints(false)
        view.addConstraint(NSLayoutConstraint(item: chatBubbleLabel, attribute: .CenterY, relatedBy: .Equal, toItem: view, attribute: .CenterY, multiplier: 1.0, constant: 0.0))
        view.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[chatBubbleLabel]-|", options: NSLayoutFormatOptions(0), metrics: nil, views: ["chatBubbleLabel": chatBubbleLabel]))
        
        dispatch_after(dispatch_time(DISPATCH_TIME_NOW, Int64(NSEC_PER_SEC * 3)), dispatch_get_main_queue()) { () -> Void in
            println("---------")
            chatBubbleLabel.text = "All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh-water system, and public health, what have the Romans ever done for us?"
        }
    }
    
    override func prefersStatusBarHidden() -> Bool {
        return true
    }
}

Output:

textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 2122.5, 20.5)
intrinsicContentSize(): (2122.5, 20.5)
textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 342.5, 142.0)
intrinsicContentSize(): (342.5, 142.0)
drawTextInRect: (0.0, 0.0, 343.0, 142.0)
---------
textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 1403.0, 20.5)
intrinsicContentSize(): (1403.0, 20.5)
textRectForBounds(_:limitedToNumberOfLines:): (0.0, 0.0, 323.5, 101.5)
intrinsicContentSize(): (323.5, 101.5)
drawTextInRect: (0.0, 0.0, 343.0, 101.5)

This is good, because it means if you’re subclassing a UILabel, you need to focus on just these two methods and the UILabel will take care of the rest. So, lets focus first on the drawTextInRect().

The first thing is to calculate the edge insets or padding you wish to give your label.

    let edgeInsets = UIEdgeInsets(top: 10, left: 20, bottom: 10, right: 20)
    
	override func drawTextInRect(rect: CGRect) {
        let textRect = UIEdgeInsetsInsetRect(rect, edgeInsets)
        super.drawTextInRect(textRect)
    }

As you can see, we’ve the required padding from all edges as we wanted, but the text is truncated at the end. This is because the UILabel currently is trying to draw the text within the original frame it had calculated. So the next task is to provide the new drawing frame to UILabel. This is done within textRectForBounds(_:limitedToNumberOfLines:). The important question is how do we calculate what frame size do we need to draw the entire text content?

The NSString has a handy extension that does this calculations. The method we’re particularly interested in is

func boundingRectWithSize(size: CGSize, options: NSStringDrawingOptions, attributes: [NSObject : AnyObject]!, context: NSStringDrawingContext!) -> CGRect

We just need to provide this method our estimated size of drawing area and other rendering attributes, and it will return the actual frame we need to render this text. This brings us to the part II of the problem: How do we calculate the estimated size of the drawing area?

Let’s see, we know the width that we wish to draw in.

let estimatedWidth = CGRectGetWidth(textRect) - (edgeInsets.left + edgeInsets.right)

But, UILabel passes the same original width to textRectForBounds(_:limitedToNumberOfLines:) and drawTextInRect(). So, this means if the original width of the UILabel is 100 pts and we wish to have a padding of 10 pts from all edges. If NSString API calculates the required height for width = 80, is lets say it’s 200. The size passed down to drawTextInRect() is 100×200, where we again shrink the size down to 80×180. In order to compensate for this second clipping, we must calculate the height for an 2 times smaller width.

let estimatedWidth = CGRectGetWidth(rect) - (2 * (edgeInsets.left + edgeInsets.right))

But what about the height? Don’t worry, this is just an estimated height, we can provide a very high value, and hope for the best.

let estimatedWidth = CGRectGetWidth(rect) - (2 * (edgeInsets.left + edgeInsets.right))
let estimatedHeight = CGFloat.max
let calculatedFrame = NSString(string: text).boundingRectWithSize(CGSize(width: estimatedWidth, height: estimatedHeight), options: .UsesLineFragmentOrigin, attributes: [NSFontAttributeName: font], context: nil)

Next, remember, the boundingRectWithSize(...) will try to wrap our text in as less space as possible, because that is the default behavior. So, we need to explicitly provide the extra top and bottom padding to the size calculated.

let calculatedWidth = ceil(CGRectGetWidth(calculatedFrame))
let calculatedHeight = ceil(CGRectGetHeight(calculatedFrame))
let finalHeight = (calculatedHeight + edgeInsets.top + edgeInsets.bottom)
rect.size = CGSize(width: calculatedWidth, height: finalHeight)

The ceil() should raise fractional mathematical value to a renderable screen value. The entire code is below.

class RoundedRectLabel: UILabel {
    
    let edgeInsets = UIEdgeInsets(top: 10, left: 20, bottom: 10, right: 20)
    
    override func textRectForBounds(bounds: CGRect, limitedToNumberOfLines numberOfLines: Int) -> CGRect {
        var rect = super.textRectForBounds(bounds, limitedToNumberOfLines: numberOfLines)

        if let text = text {
            let estimatedWidth = CGRectGetWidth(rect) - (2 * (edgeInsets.left + edgeInsets.right))
            let estimatedHeight = CGFloat.max
            let calculatedFrame = NSString(string: text).boundingRectWithSize(CGSize(width: estimatedWidth, height: estimatedHeight), options: .UsesLineFragmentOrigin, attributes: [NSFontAttributeName: font], context: nil)
            let calculatedWidth = ceil(CGRectGetWidth(calculatedFrame))
            let calculatedHeight = ceil(CGRectGetHeight(calculatedFrame))
            let finalHeight = (calculatedHeight + edgeInsets.top + edgeInsets.bottom)
            rect.size = CGSize(width: calculatedWidth, height: finalHeight)
        }
        
        return rect
    }
    
    override func drawTextInRect(rect: CGRect) {
        let textRect = UIEdgeInsetsInsetRect(rect, edgeInsets)
        super.drawTextInRect(textRect)
    }
}

To finish it off, we just need some custom drawing code. Feel free to draw whatever you like.

    override func drawTextInRect(rect: CGRect) {
        
        UIColor.cyanColor().setFill()
        UIColor.blackColor().setStroke()
        
        let edgePath = UIBezierPath(roundedRect: rect, cornerRadius: 50)
        edgePath.lineWidth = 5.0
        edgePath.lineJoinStyle = kCGLineJoinRound
        edgePath.fill()
        edgePath.stroke()
        
        let textRect = UIEdgeInsetsInsetRect(rect, edgeInsets)
        super.drawTextInRect(textRect)
    }

The entire code is also available at github. https://github.com/chunkyguy/RoundedRectLabel.

Posted in dev, iOS | Tagged , , , , | Comments Off on Subclassing UILabel in 2015

Curious case of Generics Specialization with Swift

Today, I want to talk about a curious case I discovered while playing with generic programming with Swift.

To illustrate, let’s start by writing a simple function.

func Min(x:Int, y:Int) -> Int {
    println("Using Min<Int>")
    return (x < y) ? x : y
}

let a = Min(-1, 1) // Using Min<Int>

And now let’s make it generic.

func Min<T:Comparable>(x:T, y:T) -> T {
    println("Using Min<T>")
    return (x < y) ? x : y
}

let a = Min(-1, 1) // Using Min<T>

What if we’ve both the implementations? The compiler automatically picks the specialized version.

func Min<T:Comparable>(x:T, y:T) -> T {
    println("Using Min<T>")
    return (x < y) ? x : y
}

func Min(x:Int, y:Int) -> Int {
    println("Using Min<Int>")
    return (x < y) ? x : y
}

let a = Min(-1, 1) // Using Min<Int>

The swift standard library already provides a min and max functions.

func min<T : Comparable>(x: T, y: T) -> T

Suppose we use that as the generic version and override our specialized one for Int? The compiler still picks the specialized one.

func min(x:Int, y:Int) -> Int {
    println("Using Min<Int>")
    return (x < y) ? x : y
}

let a = min(-1, 1) // Using min<Int>

This is really convenient, isn’t it? Let’s expand our example to something you might face in real life. Let’s work with a Vector2 type.

struct Vector2 {
    
    var x : Float = 0.0
    var y : Float = 0.0
    
    init(x:Float = 0,y:Float = 0) {
        
        self.x = x
        self.y = y
    }
}

How about making use of standard min and max functions with Vector2?

let lowerBound = Vector2(x: -1, y: -1)
let upperBound = Vector2(x: 1, y: 1)
let a = min(lowerBound, upperBound)

This is going to throw an error, as the standard min and max functions need the type to conform to the comparable protocol. So let’s do that first.

func ==(lhs: Vector2, rhs: Vector2) -> Bool {
    
    return (lhs.x == rhs.x) && (lhs.y == rhs.y)
}

func < (lhs: Vector2, rhs: Vector2) -> Bool {
    
    return (lhs.x < rhs.x) && (lhs.y < rhs.y)
}

func <= (lhs: Vector2, rhs: Vector2) -> Bool {
    
    return (lhs.x <= rhs.x) && (lhs.y <= rhs.y)
}

func >= (lhs: Vector2, rhs: Vector2) -> Bool {
    
    return (lhs.x >= rhs.x) && (lhs.y >= rhs.y)
}

func > (lhs: Vector2, rhs: Vector2) -> Bool {
    
    return (lhs.x > rhs.x) && (lhs.y > rhs.y)
}

extension Vector2 : Comparable {}

let lowerBound = Vector2(x: -1, y: -1)
let upperBound = Vector2(x: 1, y: 1)
let a = min(lowerBound, upperBound)

This works as expected. This is a great feature, in my opinion, one of the best things to switch from Objective-C to Swift. Moving forward, let’s write a generic clamp function.

func clamp<T:Comparable>(value:T, lowerBound:T, upperBound:T) -> T {
    return min(max(lowerBound, value), upperBound)
}

let b = clamp(10, -1, 1) // prints 1

This is awesome! Let’s try our clamp function with Vector2.

let lowerBound = Vector2(x: -100, y: -100)
let upperBound = Vector2(x: 100, y: 100)
let test1 = clamp(Vector2(x: 200, y: 200), lowerBound, upperBound) // {x:100, y:100}
let test2 = clamp(Vector2(x: -200, y: -200), lowerBound, upperBound) // {x:-100, y:-100}
let test3 = clamp(Vector2(x: -10, y: -134), lowerBound, upperBound) // {x:-10, y:-134}
let test4 = clamp(Vector2(x: 10, y: 134), lowerBound, upperBound) // {x:10, y:134}

At first this might look bad, because the test3 and test4 are not correct. But, this is not the compiler’s fault. The standard min and max use the overloaded comparison operators and they are not correct. We can test this with

let test5 = min(Vector2(x: -10, y: -134), lowerBound) // {x:-10, y:-134}
let test6 = max(Vector2(x: 10, y: 134), upperBound) // {x:10, y:134}

Let’s fix them by providing our own specialized versions.

func min(lhs: Vector2, rhs: Vector2) -> Vector2 {
    return Vector2(x: min(lhs.x, rhs.x), y: min(lhs.y, rhs.y))
}

func max(lhs: Vector2, rhs: Vector2) -> Vector2 {
    return Vector2(x: max(lhs.x, rhs.x), y: max(lhs.y, rhs.y))
}

let test5 = min(Vector2(x: -10, y: -134), lowerBound) // {x:-100, y:-134}
let test6 = max(Vector2(x: 10, y: 134), upperBound) // {x:100, y:134}

This looks better in the sense that the min and max is calculated per component. But, something is still wrong with our clamp tests, as they still print the same old value. Turns out the min and max within clamp function still use the standard generic function, rather than our provided specialized version.

The only way to make this work is if I provide a specialized clamp function for Vector2.

func clamp(value:Vector2, lowerBound:Vector2, upperBound:Vector2) -> Vector2 {
    return min(max(lowerBound, value), upperBound)
}

let test1 = clamp(Vector2(x: 200, y: 200), lowerBound, upperBound) // {x:100, y:100}
let test2 = clamp(Vector2(x: -200, y: -200), lowerBound, upperBound) // {x:-100, y:-100}
let test3 = clamp(Vector2(x: -10, y: -134), lowerBound, upperBound) // {x:-10, y:-100}
let test4 = clamp(Vector2(x: 10, y: 134), lowerBound, upperBound) // {x:10, y:100}

Note that the specialized clamp implementation for Vector2 is exactly the same as the generic one. And this is not good, as now if we want the compiler to automatically pick the right version, we have to implement the entire chain down to every function. So, it comes down to either using the generic functions all the way up or implementing the entire chain.

To compare, here’s a C++ version of the same functionality that works great with a single clamp generic implementation.

#include <iostream>

struct Vector2 {
    
    float x;
    float y;
    
    Vector2(float x = 0, float y = 0) {
        
        this->x = x;
        this->y = y;
    }
};

template <typename T>
T min(T a, T b) {
    return a < b ? a : b;
}

template <>
Vector2 min(Vector2 a, Vector2 b) {
    return Vector2(min(a.x, b.x), min(a.y, b.y));
}

template <typename T>
T max(T a, T b) {
    return a > b ? a : b;
}

template <>
Vector2 max(Vector2 a, Vector2 b) {
    return Vector2(max(a.x, b.x), max(a.y, b.y));
}

template <typename T>
T clamp(T value, T lowerBound, T upperBound) {
    return min(max(lowerBound, value), upperBound);
}

std::ostream &operator<<(std::ostream &os, const Vector2 &v) {
    os << v.x << ", " << v.y;
    return os;
}


int main() {
    Vector2 lowerBound(-100, -100);
    Vector2 upperBound(100, 100);
    
    std::cout << clamp(Vector2(200, 200), lowerBound, upperBound) << std::endl;
    std::cout <<  clamp(Vector2(-200, -200), lowerBound, upperBound) << std::endl;
    std::cout << clamp(Vector2(-10, -134), lowerBound, upperBound) << std::endl;
    std::cout <<  clamp(Vector2(10, 134), lowerBound, upperBound) << std::endl;
    
    std::cout <<  min(Vector2(-10, -134), lowerBound) << std::endl;
    std::cout <<  max(Vector2(10, 134), upperBound) << std::endl;
    
}

In conclusion, generic programming with Swift is great and definitely a step forward than Objective-C, but somebody coming from C++ would be a little disappointed. The bright side is that the Swift language is rapidly evolving and maybe this would get improved in the future version.

The entire code for this rant is available at C++ and Swift.

Posted in dev, generic programming | Tagged , | Comments Off on Curious case of Generics Specialization with Swift

Concurrency: Atomics

I think we have covered most of the core concurrency concepts. With the current knowledge we are good enough to tackle all real world concurrency related problems. But this doesn’t means that we’ve covered everything the thread libraries have to offer. And by thread libraries I mean just the C++ standard thread library and libdispatch. For example, we’re yet to see std::future, std::promise, std::packaged_task, dispatch_barier, dispatch_source in action.

Today let’s focus on the low level of the libraries, atomics. Atomics are the lowest level that we can work with a thread library. Atomic operations provide the lowest level guarantee of ordering of operations. An operation is atomic if there is a guarantee that the operation would be never be left by any thread in an indeterminate state.

To make sure that the operation is atomic, internally the runtime could be either not switch the thread while an atomic operation is underway or maybe it simply uses a lock or whatever innovation the technology has to offer. As a user of the library, you just get the guarantee that atomic operation are indivisible.

To facilitate atomicity the C++ standard library offers a few atomic types. All the types offer a is_lock_free() function to test if the operation is done really not using any locks. Only exception is std::atomic_flag which is the always lock free.

Atomic Types

C++ provides a lot of atomic types. You can say that for almost all the fundamental types have an atomic equivalent. (And by fundamental types I mean bool and integers, no floating points, as you’ll see in a moment why). You can use them directly or as std::atomic<> template specialization. Here’s an example

std::atomic_bool b1;
std::atomic<bool> b2;

This same pattern is applied to all other fundamental types. Although, you can use them as your usual fundamental types, there are few operations that are not allowed. First striking constraint is disallowed copy and assign operation. This code won’t compile

void TryCopy()
{
    std::atomic<bool> b1(false);
    std::atomic<bool> b2 = b1;
}

Then how do we use these atomic types? This brings us to load() and store() operations. All atomic types except atomic_flag offer load(), store(), exchange(), compare_exchange_weak() and compare_exchange_strong() operations. Let’s take a look at what do they do.

load and store

If you’ve ever done a bit of assembly, you must be familiar with load and store operations. Load retrieves the data while store saves the data.

void DoLoad()
{
    std::atomic<int> i(10);
    std::atomic<int> j(i.load());
    std::cout << j << std::endl; // prints 10
}

void DoStore(const int n)
{
    std::atomic<int> i(0);
    i.store(n);
    std::cout << i << std::endl; // prints whatever n is
}

exchange

Apart from the basic load and store, you also get a bunch of exchange operations. An exchange operation does exactly what you’d expect, store a new value and return the old value.

void DoExchange(const int n)
{
    std::atomic<int> i(50);
    int j = i.exchange(n);
    std::cout << i << " " << j << std::endl; // j = i; i = n;
}

An exchange operation is basically a 3 step operation. First it loads the data, second it updates the data with new data, and third it stores the new data back. And this should explain why floating points are left out from fundamental types. Floating point types are not deterministic at comparison.

And since we’re dealing with the lowest level of concurrency operations. Sometimes you want a greater control over the execution. Maybe you need a stronger guarantee that the 3 step exchange was indeed done successfully before the running thread’s just ran out of time.

For such grained control the C++ standard library offers two more exchange operations. compare_exchange_weak() and compare_exchange_strong().

The compare_exchange_weak() returns false, if the exchange wasn’t successful. This could be because if the running thread’s time just ran out and was kicked out by the scheduler before it could finish the steps. This is called as spurious failure.

void DoExchangeWeak(const int desired)
{
    int expected = 50;
    std::atomic<int> i(expected);

    bool success = i.compare_exchange_weak(expected, desired);
    
    std::cout << std::boolalpha << success << " " << desired << " "
                << i << " " << expected << std::endl; // true 100 100 50
}

So if you want the exchange to run successfully every time, you probably need to put this operation under a loop. So that whenever the operation fails, you keep trying until it succeeds.

while (i.compare_exchange_weak(expected, desired) != true) {
}

Or, you can simple use compare_exchange_strong() which is guaranteed to eliminate all spurious failures.

bool success = i.compare_exchange_strong(expected, desired);

Both of these functions return false whenever the expected value is not same as the stored value. That is, whenever the comparison fails, and in that case the expected updates to whatever was the actual value. For example:

void DoExchangeWeak(const int desired)
{
    int expected = 0;
    std::atomic<int> i(5);

    bool success = i.compare_exchange_weak(expected, desired);
    
    std::cout << std::boolalpha << success << " "
    << desired << " " << i << " " << expected << std::endl; 
    // false 1 5 5
}

For all fundamental types that support +=, -+, |=, &=, and ^=, the atomic types have equivalent fetch_add, fetch_sub, fetch_or, fetch_and, fetch_xor operation available.

void DoFetchAdd(const int n)
{
    std::atomic<int> i(100);
    int j = i.fetch_add(n);
    std::cout << i << " " << j << std::endl; 
    //for n = 50; output: 150, 100 => j = i; i += n;
}

Finally, if you want your custom type to work as an atomic type, you can do that guaranteed that your custom type don’t do anything fancy. What that means in practical world is that your type should work with memcpy() and memcmp(). That is plain C types, no virtual table lookups. Here’s trivial example:

struct MyType {
    int a;
    int b;
};

std::ostream &operator<<(std::ostream &os, const MyType &t)
{
    os << "{" << t.a << ", " << t.b << "}";
    return os;
}

void DoCustomExchange()
{
    std::atomic<MyType> a;
    a.store({10, 20});
    MyType b = {11, 21};
    MyType c = a.exchange(b);

    std::cout << "a: " << a << std::endl; // a: {11, 21}
    std::cout << "b: " << b << std::endl; // b: {11, 21}
    std::cout << "c: " << c << std::endl; // c: {10, 20}
}

There’s big part dealing with memory ordering that we’ve simply skipped for now, but we shall come back to it later. Let’s now focus on the most important atomic type atomic_flag.

atomic_flag

Forget whatever that has been said about atomic types so far. None of that applies to std::atomic_flag. std::atomic_flag is different. You can say that std::atomic_flag is the core of the threading library. Its like the atom of the universe. Let’s start exploring std::atomic_flag.

Let’s consider scenario. On your social network you get a lot of LOL text that you just can’t understand. So, you decide to write a program to convert that text either into a full uppercase or full lower case.

class UnLOLText {
public:
    UnLOLText(const std::string &name) :
    username_(name),
    modify_(0)
    {
        srand((unsigned int)time(0));
    }
    
    void ToUpper()
    {
        while (modify_ < username_.size()) {
            username_[modify_] = toupper(username_[modify_]);
            modify_++;
        }
    }
    
    void ToLower()
    {
        while (modify_ < username_.size()) {
            username_[modify_] = tolower(username_[modify_]);
            modify_++;
        }
    }
    
    void Reset()
    {
        modify_ = 0;
    }
    
    friend std::ostream &operator<<(std::ostream &os, const UnLOLText &txt);
    
private:
    
    std::size_t modify_;
    std::string username_;
};

std::ostream &operator<<(std::ostream &os, const UnLOLText &txt)
{
    os << txt.username_;
    return os;
}

void Scene1()
{
    UnLOLText txt("hEy How aRe yoU dOinG!");

    txt.ToUpper();
    std::cout << "Serial upper: " << txt << std::endl;

    txt.Reset();
    
    txt.ToLower();
    std::cout << "Serial lower: " << txt << std::endl;

    txt.Reset();
}

Output:

Serial upper: HEY HOW ARE YOU DOING!
Serial lower: hey how are you doing!

The amount of such LOL text you receive is huge. So obviously you want to unLOL the the text concurrently.

class UnLOLText {
public:
    UnLOLText(const std::string &name) :
    username_(name),
    modify_(0),
    flag(ATOMIC_FLAG_INIT)
    {
        srand((unsigned int)time(0));
    }
    
    void ToUpper()
    {
        while(flag.test_and_set(std::memory_order_seq_cst)) {
        }

        while (modify_ < username_.size()) {
            username_[modify_] = toupper(username_[modify_]);
            modify_++;
            thread_sleep();
        }

        flag.clear(std::memory_order_release);
    }

    void ToLower()
    {
        while(flag.test_and_set(std::memory_order_seq_cst)) {
        }

        while (modify_ < username_.size()) {
            username_[modify_] = tolower(username_[modify_]);
            modify_++;
            thread_sleep();
        }

        flag.clear(std::memory_order_release);
    }

    void Reset()
    {
        modify_ = 0;
    }
    
    friend std::ostream &operator<<(std::ostream &os, const UnLOLText &txt);
    
private:

    void thread_sleep()
    {
        std::this_thread::sleep_for(std::chrono::microseconds(rand()%5+1));
    }
    
    size_t modify_;
    std::string username_;
    std::atomic_flag flag;
};

void Scene2()
{
    UnLOLText txt("hEy How aRe yoU dOinG!");
    
    std::thread tab1(&UnLOLText::ToUpper, &txt);
    std::thread tab2(&UnLOLText::ToLower, &txt);

    tab1.join();
    tab2.join();
    
    std::cout << "Concurrent random: " << txt << std::endl;

    txt.Reset();
}

Using the std::atomic_flag you can set the flag as soon as one of the threads select a routine and then clear it only after the entire modification is done. So, using std::atomic_flag you’re randomly selecting a thread and blocking all the rest.

This almost sounds like what std::mutex does right. In fact, using std::atomic_flag you can implement your own mutex object.

class CustomMutex {
public:
    CustomMutex() :
    flag(ATOMIC_FLAG_INIT)
    {}
    
    void lock()
    {
        while(flag.test_and_set(std::memory_order_seq_cst)) {
        }
    }
    
    void unlock()
    {
        flag.clear(std::memory_order_release);
    }
    
private:
    std::atomic_flag flag;
};

class UnLOLText {
public:
    UnLOLText(const std::string &name) :
    username_(name),
    modify_(0)
    {
        srand((unsigned int)time(0));
    }
    
    void ToUpper()
    {
        mutex_.lock();

        while (modify_ < username_.size()) {
            username_[modify_] = toupper(username_[modify_]);
            modify_++;
            thread_sleep();
        }

        mutex_.unlock();
    }

    void ToLower()
    {
        mutex_.lock();
        
        while (modify_ < username_.size()) {
            username_[modify_] = tolower(username_[modify_]);
            modify_++;
            thread_sleep();
        }

        mutex_.unlock();
    }

    void Reset()
    {
        modify_ = 0;
    }
    
    friend std::ostream &operator<<(std::ostream &os, const UnLOLText &txt);
    
private:

    void thread_sleep()
    {
        std::this_thread::sleep_for(std::chrono::microseconds(rand()%5+1));
    }
    
    size_t modify_;
    std::string username_;
    CustomMutex mutex_;
};

And since you’re so far, you can even just reap the benefits of std::lock_guard for locking and unlocking the mutex for you. Remember, std::lock_guard is based on RAII principles, so you get a exception-safe guarantee that no matter what the rest of the code does (except deadlock) your mutex will get unlocked.

class UnLOLText {
public:
    UnLOLText(const std::string &name) :
    username_(name),
    modify_(0)
    {
        srand((unsigned int)time(0));
    }
    
    void ToUpper()
    {
        std::lock_guard<CustomMutex> lock(mutex_);

        while (modify_ < username_.size()) {
            username_[modify_] = toupper(username_[modify_]);
            modify_++;
            thread_sleep();
        }
    }

    void ToLower()
    {
        std::lock_guard<CustomMutex> lock(mutex_);
        
        while (modify_ < username_.size()) {
            username_[modify_] = tolower(username_[modify_]);
            modify_++;
            thread_sleep();
        }
    }

    void Reset()
    {
        modify_ = 0;
    }
    
    friend std::ostream &operator<<(std::ostream &os, const UnLOLText &txt);
    
private:

    void thread_sleep()
    {
        std::this_thread::sleep_for(std::chrono::microseconds(rand()%5+1));
    }
    
    size_t modify_;
    std::string username_;
    CustomMutex mutex_;
};

Big deal, right? We just reinvented something that is already provided by the C++ standard library. And I guarantee they’ve a better implementation of the mutex. So, what can we extra out of working at such low level?

With std::atomic_flag you must’ve noticed we use std::memory_order_seq_cst and std::memory_order_release, what are they? They specify the memory ordering of operations. Let take the red pill and follow down the memory ordering hole.

Memory ordering

This is probably the weirdest topic you’ll encounter as a programmer, as this will put some doubts over your knowledge of how you thought instructions execute at the lower level.

First lets take a look at all types of memory orderings possible. Memory orderings are classified for 3 major classes of operations

  1. Store: seq_cst, release and relaxed
  2. Load: seq_cst, acquire, consume and relaxed
  3. Exchange: seq_cst, acq_rel, acquire, release, consume, relaxed

If we think in terms of different memory models available, we can classify these operations as:

  1. Default: seq_cst
  2. Unordered: relaxed
  3. Lock based: acquire, consume, release and acq_rel

You’re already familiar with sequentially consistent (seq_cst) for this is what you’ve been doing all your life. You see a piece of code and you follows through the lines of code top to bottom, because that’s how the execution works, right? Let’s see.

Let’s say there’s new trend that every awesome software company is following. They have placed a coffee machine and a fedora hat machine at the entrance lobby. And they require every employee to have a fedora hat on their heads or a cup of coffee in their hands to enter the office. They certainly favor the employee getting both the items. According to a survey this allegedly increases the hip level of the employee and brings more energy and productivity in the office. Say each of these items increment the employee’s hip level by 1.

So company A tries to implement this with the default memory model. They noticed that some employees prefer wearing the hat first and then holding the coffee, while other hold the coffee first and then wear the hat. So they have two security personnels, one that waits until employee wears a hat and then it checks if the employee also has a coffee. The second one does completely opposite, he first waits for the employee to hold the coffee and then checks if the employee also has a hat on. After both the security personnels have reported back, the doors decides whether to grant entry or not.

namespace defult {
    std::atomic<bool> hasHat;
    std::atomic<bool> hasCoffee;
    std::atomic<int> hipLevel;
    
    void WearHat()
    {
        std::this_thread::sleep_for(std::chrono::seconds(rand()%3+1));
        hasHat.store(true, std::memory_order_seq_cst);
    }
    
    void HoldCoffee()
    {
        std::this_thread::sleep_for(std::chrono::seconds(rand()%3+1));
        hasCoffee.store(true, std::memory_order_seq_cst);
    }
    
    void CheckHatAndCoffee()
    {
        while (!hasHat.load(std::memory_order_seq_cst)) {
            /* wait till employee gets a hat */
        }
        
        if (hasCoffee.load(std::memory_order_seq_cst)) {
            hipLevel++;
        }
    }
    
    void CheckCoffeeAndHat()
    {
        while (!hasCoffee.load(std::memory_order_seq_cst)) {
            /* wait till employee gets a coffee */
        }
        
        if (hasHat.load(std::memory_order_seq_cst)) {
            hipLevel++;
        }
    }
    
    void EmployeeEnter()
    {
        hasHat = false;
        hasCoffee = false;
        hipLevel = 0;
        
        std::thread a(WearHat);
        std::thread b(HoldCoffee);
        std::thread c(CheckHatAndCoffee);
        std::thread d(CheckCoffeeAndHat);
        
        a.join();
        b.join();
        c.join();
        d.join();
        
        if (hipLevel == 0) {
            std::cout << "Entry denied" << std::endl;
        } else {
            std::cout << "Entry granted with hip level: " << hipLevel << std::endl;
        }
    }
}

If you follow the code, you’ll notice that there is no way an employee can be denied an entry. No matter how long an employee takes to get a coffee or a hat, as soon as he does one thing the observing security personnel will check the other item, if they have it good, otherwise whenever they get the other item the second security will activate and this time employee will definitely pass the test, as they already have the first item.

Company B follows the unordered memory model.

namespace unordered {
    std::atomic<bool> hasHat;
    std::atomic<bool> hasCoffee;
    std::atomic<int> hipLevel;

    void GetThings()
    {
        hasHat.store(true, std::memory_order_relaxed);
        hasCoffee.store(true, std::memory_order_relaxed);
    }
    
    void CheckCoffeeAndHat()
    {
        while (!hasCoffee.load(std::memory_order_relaxed)) {
            /* wait till employee gets a coffee */
        }
        
        if (hasHat.load(std::memory_order_relaxed)) {
            hipLevel++;
        }
    }
    
    void EmployeeEnter()
    {
        hasHat = false;
        hasCoffee = false;
        hipLevel = 0;
        
        std::thread a(GetThings);
        std::thread b(CheckCoffeeAndHat);
        
        a.join();
        b.join();
        
        if (hipLevel == 0) {
            std::cout << "Entry denied" << std::endl;
        } else {
            std::cout << "Entry granted with hip level: " << hipLevel << std::endl;
        }
    }
}

They came up with the idea that they don’t really need two security personnels. What they instead do is that they instruct their employees to wear a hat and get coffee. So, the security has to only test for the coffee, because the employee must already have the hat by then. But this can procedure can fail, some of the employees can be denied entry. This is because the operations aren’t sequentially consistent anymore. When an employee is instructed to GetThings(), the employee sees it as there are 2 tasks they have to complete in relaxed manner, that is perform whatever seems convenient. The employee has no idea if any other thread is monitoring its activities. It just cares enough that by the time it has to exit GetThings() it need to have executed both the tasks. So, in case the employee feels like getting the coffee first and then the hat, there’s nobody stopping them. While, the security personnel is under false impression that whenever a employee has a coffee in their hands they must also have a hat on their heads. So every once in a while the security can encounter an employee that has a coffee in his hands but not hat yet, but the security doesn’t waits for the employee to get the hat and instead immediately runs them through the door, which obviously denies them the entry. And it’s all due to the misunderstood relaxed memory ordering.

Company C learns the lessons from both companies A and B, and wants to get the best of both worlds. So it adopts the lock based memory model.

namespace lock {
    std::atomic<bool> hasHat;
    std::atomic<bool> hasCoffee;
    std::atomic<int> hipLevel;
    
    void GetThings()
    {
        hasHat.store(true, std::memory_order_relaxed);
        hasCoffee.store(true, std::memory_order_release);
    }
    
    void CheckCoffeeAndHat()
    {
        while (!hasCoffee.load(std::memory_order_acquire)) {
            /* wait till employee gets a coffee */
        }
        
        if (hasHat.load(std::memory_order_relaxed)) {
            hipLevel++;
        }
    }
    
    void EmployeeEnter()
    {
        hasHat = false;
        hasCoffee = false;
        hipLevel = 0;
        
        std::thread a(GetThings);
        std::thread b(CheckCoffeeAndHat);
        
        a.join();
        b.join();
        
        if (hipLevel == 0) {
            std::cout << "Entry denied" << std::endl;
        } else {
            std::cout << "Entry granted with hip level: " << hipLevel << std::endl;
        }
    }
}

Here the all the tasks are still relaxed, except for the check on coffee. The test on coffee is set as the synchronization point. Here hasCoffee serves as a token that both the employee and the security agrees on. The employee is free to do whatever it wishes to do in whatever order if they agree to perform the store on hasCoffee at the exact point as they’re expected to. It serves as a kind of checkpoint. Whenever an employee gets a coffee, it means that they have done all the prior tasks in whatever order that seems fit, nobody cares. So whenever the security sees a employee has a coffee in their hands, it is guaranteed that all the tasks before it have been completed. So, now the check for the hat can be successfully executed.

Company D took a slightly different approach than company C.

namespace lock2 {
    std::atomic<bool> hasHat;
    std::atomic<bool> hasCoffee;
    std::atomic<int> hipLevel;
    
    void GetThings()
    {
        std::this_thread::sleep_for(std::chrono::seconds(rand()%3+1));
       
        hasHat.store(true, std::memory_order_relaxed);
        hasCoffee.store(true, std::memory_order_release);
    }
    
    void CheckCoffeeAndHat()
    {
        int naps_taken = 0;
        while (!hasCoffee.load(std::memory_order_consume)) {
            /* nap for a while */
            naps_taken++;
            std::this_thread::sleep_for(std::chrono::seconds(rand()%2+1));
        }
        
        std::cout << "Naps: " << naps_taken << std::endl;
        
        if (hasHat.load(std::memory_order_relaxed)) {
            hipLevel++;
        }
    }
    
    void EmployeeEnter()
    {
        hasHat = false;
        hasCoffee = false;
        hipLevel = 0;
        
        std::thread a(GetThings);
        std::thread b(CheckCoffeeAndHat);
        
        a.join();
        b.join();
        
        if (hipLevel == 0) {
            std::cout << "Entry denied" << std::endl;
        } else {
            std::cout << "Entry granted with hip level: " << hipLevel << std::endl;
        }
    }
}

It still uses lock based memory model, but instead of asking the security to using acquire based loop they use consume based loop. What this means is that instead of constantly monitoring employees, the security can take a nap once in a while and whenever they wake up they just assume that the employee has a coffee in their hands, if not they can go back to sleep. This approach works good when you have a somewhat predictable data on how long does an average employee takes to GetThings().

Atomics and Objecitve-C

Coming over to Objective-C, atomicity is simplified. All the properties are by default atomic. This is good news because if you’re using multiple threads to update the same property, you will always have a valid value for that property. But this doesn’t means that the entire object will be valid.

Let’s consider a employee record example:

@interface Employee : NSObject

@property (copy) NSString *firstName;
@property (copy) NSString *lastName;
@property int coffeeConsumed; // in litres

- (NSString *)description;

@end

@implementation Employee

- (id)init
{
    self = [super init];
    if (!self) {
        return nil;
    }
    
    _firstName = [@"Monty" copy];
    _lastName = [@"Burns" copy];
    _coffeeConsumed = 9235;
    
    return self;
}

- (void)dealloc
{
    self.firstName = nil;
    self.lastName = nil;
    [super dealloc];
}

- (NSString *)description;
{
    return [NSString stringWithFormat:@"%@ %@: %@ L",
            _firstName,
            _lastName,
            @(_coffeeConsumed)];
}
@end

We simply create a employee with some default values. Now suppose we try to update a single record concurrently

void updateRecord()
{
    dispatch_group_t wait = dispatch_group_create();
    dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
    
    Employee *emp = [[Employee alloc] init];
    
    dispatch_group_enter(wait);
    dispatch_async(queue, ^{
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.firstName = @"Homer";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.lastName = @"Simpson";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.coffeeConsumed = 2045;
        dispatch_group_leave(wait);
    });
    
    
    dispatch_group_enter(wait);
    dispatch_async(queue, ^{
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.firstName = @"Lenny";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.lastName = @"Leonard";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.coffeeConsumed = 127;
        dispatch_group_leave(wait);
    });
    
    dispatch_group_enter(wait);
    dispatch_async(queue, ^{
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.firstName = @"Carl";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.lastName = @"Carlson";
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        emp.coffeeConsumed = 598;
        dispatch_group_leave(wait);
    });
    
    dispatch_group_enter(wait);
    dispatch_async(queue, ^{
        [NSThread sleepForTimeInterval:rand()/(NSTimeInterval)RAND_MAX];
        NSLog(@"%@", emp);
        dispatch_group_leave(wait);
    });
    
    dispatch_group_wait(wait, DISPATCH_TIME_FOREVER);
    [emp release];
}

int main(int argc, char * argv[]) {
    @autoreleasepool {
        for (int i = 0; i < 3; ++i) {
            srand(time(0));
            updateRecord();
        }
        NSLog(@"Done");
    }
    return 0;
}

Output

2014-11-01 16:11:41.496 a.out[19307:1403] Carl Simpson: 9235 L
2014-11-01 16:11:43.314 a.out[19307:1a03] Homer Burns: 9235 L
2014-11-01 16:11:46.732 a.out[19307:1a03] Lenny Simpson: 2045 L
2014-11-01 16:11:47.558 a.out[19307:507] Done

You can see that even though the emp object is absurd every time, but yet all of its atomic properties have valid values all the time.

As far as I’m aware of nothing is known about atomicity and Swift, but I’m guessing it would be close to the Objective-C model.

As usual the code for today’s experiment is available at github.com/chunkyguy/ConcurrencyExperiments.

Have fun!

Posted in concurrency, dev | Tagged , , , | Comments Off on Concurrency: Atomics

Add Some Perspective to Your UIViews

The Safari for iOS has some very interesting perspective effect built-in. You can check that when you’ve multiple tabs open. How about building something like that? Those views are clearly UIViews right? You can see the web content rendered, they have cross buttons. The content even periodically updates without launching the app using the background services probably.

IMG_1315

Lets start with searching for something in the Apple’s documentation. As far as I was able to search, I only got so far to the Adding Perspective to Your Animations in the Apple’s Core Animation Programming Guide.

If you just scroll down the ‘Advanced Animation Tricks’ page, at the bottom you’ll find around 10 lines and a small code snippet explaining how to add perspective to your CALayer object.

This is a good start, it’s not great as it skips a lot of details and that’s for a reason. To adding perspective, you have to be familiar with the linear algebra behind it, and we shall get to in the next session. But for now, let’s get started.

So, create a new single view project and add a new Swift file to it. Let’s call it PerspectiveView.swift.

class PerspectiveView: UIView {
}

In the storyboard add a UIView and set its class to be PerspectiveView type. Next, we need a container view to hold any subview into it. We’ll apply perspective to this container view and hopefully the perspective gets applied to all the contained subviews. Let’s call this container view as contentView.

let contentView:UIView?

required init(coder aDecoder: NSCoder)
{
	super.init(coder: aDecoder)
	
	contentView = UIView()

	contentView?.backgroundColor = UIColor.yellowColor()
	backgroundColor = UIColor.purpleColor()

	setUp()
}

I’ve set the background color of the main view as purple and the contentView as yellow just for debugging. Next, lets implement the setUp(). This is where we configure the contentView

func setUp()
{
	var viewDict:[String: UIView] = [String: UIView]()

	viewDict["contentView"] = contentView!
	addSubview(contentView!)
	
	if let imagePath:String = NSBundle.mainBundle().pathForResource("photo4", ofType: "JPG") {
		let image:UIImage = UIImage(contentsOfFile: imagePath)
		let imageView:UIImageView = UIImageView(image: image)
		imageView.setTranslatesAutoresizingMaskIntoConstraints(false)
		viewDict["imageView"] = imageView
		contentView?.addSubview(imageView)
	}

	applyConstraints(viewDict: viewDict)
	applyPerspective()
}

We’re just creating required views here. Here we’re just adding the view that we wish to have perspective applied on to. For now, I’m just adding a single UIImageView.

Next up, applying constraints.

func applyConstraints(#viewDict:[String: UIView])
{        
	contentView?.setTranslatesAutoresizingMaskIntoConstraints(false)
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:[contentView(>=100)]",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:[contentView(>=100)]",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	

	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[imageView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	contentView?.addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:|-[imageView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))

	addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("H:|-[contentView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
	addConstraints(NSLayoutConstraint.constraintsWithVisualFormat("V:|-[contentView]-|",
		options: NSLayoutFormatOptions(0),
		metrics: nil,
		views: viewDict))
}

I’m just centering all the subview. This is just me being lazy. You should probably add proper constraints to keep the image view aspect correct.

Now coming to the interesting part. Applying perspective.

We just call the calculatePerspectiveTransform() to do all the calculation and return back a CATransform3D object that we can simply apply to the CALayer of our contentView. As for the calculation let’s simply copy-paste the code from Core Animation Programming Guide to calculate the transform.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	return contentTransform
}

func applyPerspective()
{
	contentView?.layer.sublayerTransform = calculatePerspectiveTransform()
}

That’s it. Give it a run. That is not my dog BTW, just photobombing my photo out of nowhere.

 

If for now we just ignore the distortion of the image, which is a constraints issue. There are some other questions to be answered first.

  1. What is sublayerTransform?
  2. Why is eyePosition 10?
  3. What’s up with m34?
  4. And most importantly, where’s the perspective?

Turns out, the snippet provided by the Core Animation Programming Guide just adds the perspective. But, in order to see it in action you still have to modify the transform a bit more. Let’s add a little translation to the transform.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

2

Good. But from this angle this looks just as if the image has been scaled down. Why not rotate it along x axis.

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 10.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = CGFloat(-1/eyePosition)
	
	contentTransform = CATransform3DRotate(contentTransform, CGFloat(GLKMathDegreesToRadians(45)), 1, 0, 0)
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

3

Wow! Now we have some perspective. We can now either just tinker with the magic numbers until we get the desired effect, or we can get a deeper understanding of things and have a better control over things.

I prefer the latter. So let’s begin by answering some of the questions from above.

What is sublayerTransform?

If you look at the interface of CALayer, you’ll see there are two CATransform3D types.

class CALayer : NSObject, NSCoding, CAMediaTiming {
	/* other stuff */
    var transform: CATransform3D
    var sublayerTransform: CATransform3D
}

When you modify transform, the layer’s own content gets modified. Whereas, when you modify the sublayerTransform, the sublayers get modified, while the receiver’s layer remains untouched.

If you replace sublayerTransform with transform

contentView?.layer.transform = calculatePerspectiveTransform()

You would get something like

4

See what I mean? Our contentView which had a background color yellow also got modified. Let’s undo that code change. We need the contentView unmodified for things like reading touch and gestures.

What is eyePosition?

For the code above, it’s enough to understand that eyePosition here just means the degree of perspective you want to have. If it is some larger value, the effect is less and if it is a smaller value, the effect more. We shall look behind the maths of this in the next session with linear algebra. But for now you can try experimenting with values like 5, 50, 500, 5000, 50000 and see the changes yourself.

7

What’s up with m34?

There are mostly two kind of projections we see in computer graphics, the orthogonal projection and the perspective projection. Orthogonal is the default projection used by any 2D system like UIKit. Whereas the perspective projection is used by all 3D systems, like any 3D game. The main difference is that in orthogonal projection the distance from the viewer is not accounted, or in other words the z axis is totally ignored.

CATransform3D transform is a 4×4 matrix. It works in a homogenous coordinate system. We will dive deeper into homogenous coordinate system in a later session. For now it’s enough to understand that the purpose of this matrix is to convert your points from a 3D space to a screen space which is in 2D.

The m34, or the value at 3rd row, 4th column of the matrix is the biggest hint whether the projection matrix is an orthogonal or a perspective. An orthogonal projection matrix typically has m34 as 0 while a perspective matrix has some negative value here, typically -1. If in the above code, you simply set the value of m34 as 0, you’ll notice that all the perspective effect is gone!

func calculatePerspectiveTransform() -> CATransform3D
{
	let eyePosition:Float = 50000.0;
	var contentTransform:CATransform3D = CATransform3DIdentity
	contentTransform.m34 = 0
	
	contentTransform = CATransform3DRotate(contentTransform, CGFloat(GLKMathDegreesToRadians(45)), 1, 0, 0)
	contentTransform = CATransform3DTranslate(contentTransform, 0, 0, -20)

	return contentTransform
}

5

Of course, the view is scaled, because this isn’t a perfect orthogonal matrix, we’re still performing the rotation and the translation on the matrix. But the main perspective effect is gone.

With the basic questions answered, let’s now build a proper projection system. I’ll be using the GLKit, because it has all the things we need to build the system we want. Since, as of this writing GLKit isn’t available in Swift, we might have to do this work in Objective-C and return the CATransform3D object back to our Swift class, where we can simply apply it to our contentView.

// Transform.h
@interface Transform : NSObject
@property (nonatomic, readonly) CATransform3D transform;
@end

// PerspectiveView.swift
func applyPerspective()
{
	let contentTransform:Transform = Transform()
	contentView?.layer.sublayerTransform = contentTransform.transform
}

First we need a Camera class. Think of a camera. In a camera you control two things. First is the lens setting like the field of view, focus, aperture. The kind of things you usually set once before you begin the filming. The second is the camera motion, like the direction you want to point, rotating the camera and so forth.

/** Camera object */
@interface Camera : NSObject

/* lens */
// field of view - in radians
@property (nonatomic, readwrite) float fov;
@property (nonatomic, readwrite) float aspectRatio;
// near and far planes
@property (nonatomic, readwrite) float nearZ, farZ;

/* motion  */
@property (nonatomic, readwrite) float eyeX, eyeY, eyeZ;
@property (nonatomic, readwrite) float centerX, centerY, centerZ;
@property (nonatomic, readwrite) float upX, upY, upZ;

/* Read by Transform object */
@property (nonatomic, readonly) GLKMatrix4 projectionMatrix;
@property (nonatomic, readonly) GLKMatrix4 viewMatrix;

@end

Let’s look at each of these items one by one:

1. Field of view: Controls the area you want to capture. The wider the angle, the more area you capture. But at the cost of greater distortion. The fish eye lens has a very wide fov.

2. Aspect ratio: Controls the aspect ratio of the image captured. A value of 1 means the captured image will be distorted to fit within a square. You typically want this to be the actual ratio you wish to capture.

3. nearZ, farZ: The clip planes. Anything farther than farZ or nearer than nearZ doesn’t gets included in the final image. You don’t want the farZ to be too far, as it brings in more floating-point errors. So if set the farZ to 1,000,000 and you’ve two objects placed at z 10 and 12. The one at 12 might overlap the one at 10, even though it is farther down the z axis. Typically a value of 0.1 and 100 is good enough for nearZ and farZ respectively.

4. center: This is where the camera is placed at. Default value is origin.

5. up: Tells what side is considered as up. Default value if (0,1,0), that is up is along the y axis.

6. eye: This is the direction you’re pointing at. Actually the final direction is calculated using the up and center values as well.

Since the Camera class controls two independent things, we can have separate matrices for each one of those. The role of a matrix is just to transform points from one coordinate system to another. A matrix simply defines a coordinate system. So, if you have a 4×4 matrix and you multiply it with a 4D vector you get a 4D vector in the transformed coordinate space.

For our Camera class, we just keep track of two coordinate systems

- (void) updateProjection;
{
    self.projectionMatrix = GLKMatrix4MakePerspective(GLKMathDegreesToRadians(_fov),
                                                      _aspectRatio,
                                                      _nearZ, _farZ);
}

- (void) updateView;
{
    self.viewMatrix = GLKMatrix4MakeLookAt(_eyeX, _eyeY, _eyeZ,
                                           _centerX, _centerY, _centerZ,
                                           _upX, _upY, _upZ);
}

Whenever any of these values get updated, we simply update the related matrix.

Now let’s focus on the Transform class.

@interface Transform : NSObject

- (id)initWithCamera:(Camera *)camera;

@property (nonatomic, readwrite) float positionX, positionY, positionZ;
@property (nonatomic, readwrite) float rotationX, rotationY, rotationZ;
@property (nonatomic, readwrite) float scaleX, scaleY, scaleZ;
@property (nonatomic, readwrite) float angle;

@property (nonatomic, readonly) CATransform3D transform;

@end

This class is pretty straightforward. It just describes the transformations to be applied the object. The only thing that could be misunderstood is rotation. Rotation doesn’t describes the angle, it describes the axis. For angle we’ve another property. The final rotation is calculated from both rotation and angle.

Why do we need the Camera object? It’s because the final image is calculated after considering the camera object. Think of you shooting a frog leaping. The final captured motion depends on both the leap of the frog and the motion of your camera.

With that setup, lets see what can we build now. With a little experiment on the fov, camera’s eyeZ and the transform angle.

func applyPerspective()
{
	// config camera
	let contentCam:Camera = Camera()
	contentCam.fov = 10
	contentCam.aspectRatio = Float(CGRectGetWidth(UIScreen.mainScreen().bounds))/Float(CGRectGetHeight(UIScreen.mainScreen().bounds))
	contentCam.eyeZ = 25
	
	// config content transform
	let contentTransform:Transform = Transform(camera: contentCam)
	contentTransform.rotationX = 1.0
	contentTransform.rotationY = 0.0
	contentTransform.rotationZ = 0.0
	contentTransform.angle = -1.0
	
	contentView?.layer.sublayerTransform = contentTransform.transform
}

I was able to get this

6

Now, for your experimentation, you can test that when we update the fov how distorted the image gets. Also updating the eyeZ of camera actually zooms in and out the content. Next you can also experiment with different transform angles and axes.

There’s a whole lot of things I’ve skipped in the implementation of the Camera and the Transform class. It’s more about why things work the way they work. In particular the insides of

  • GLKMatrix4MakeLookAt()
  • GLKMatrix4MakePerspective()
  • GLKMatrix4MakeTranslation()
  • GLKMatrix4Rotate()
  • GLKMatrix4Multiply()
  • and much more

These are some of the most important functions where the real linear algebra magic is. But, I thought this is more like low level stuff and deserves a rant of its own. In the follow-up I’ll dive deeper in to these function and much more on linear algebra.

The code for this article is available at https://github.com/chunkyguy/DemoPerspectiveView

Posted in dev, iOS, maths | Tagged , , , | Comments Off on Add Some Perspective to Your UIViews

Concurrency: Deadlocks

We’ve explored so much with our concurrency experiments and yet there’s one fundamental topic we haven’t touched so far. Deadlocks. Whenever we think of concurrency, we can’t help thinking of locks for protecting shared resources, managing synchronization and what not. But, as Uncle Ben has said, with great power comes great responsibility, similarly with great locks comes great deadlocks.

Deadlocks with multiple locks

In the classical concurrency theory, deadlocks happen when more than one thread needs mutual exclusivity over more than one shared resource. There’s a classic Dining Philosophers problem that demonstrates this deadlock situation.

Here’s the copy paste from the wikipedia page:

Five silent philosophers sit at a round table with bowls of spaghetti. Forks are placed between each pair of adjacent philosophers.

Each philosopher must alternately think and eat. However, a philosopher can only eat spaghetti when he has both left and right forks. Each fork can be held by only one philosopher and so a philosopher can use the fork only if it’s not being used by another philosopher. After he finishes eating, he needs to put down both forks so they become available to others. A philosopher can grab the fork on his right or the one on his left as they become available, but can’t start eating before getting both of them.

There are many problems hidden inside this one problem, but lets focus on the situation where every philosopher gets super hungry has acquired the fork on their left side and waiting for someone to release a fork before they can begin eating. But since, no philosopher is willing to release their fork, we have a deadlock situation.

Let’s reduce this problem to just two philosophers dining. Here’s a simple implementation

#define MAX_FORKS 2

std::mutex fork[MAX_FORKS];

class Philosopher {
private:
    std::string name_;
    int holdForkIndex_;

    void Think()
    {
        std::this_thread::sleep_for(std::chrono::seconds(1));
    }
    
public:
    Philosopher(const std::string &name, const int startForkIndex) :
    name_(name),
    holdForkIndex_(startForkIndex)
    {}

    void Eat()
    {
        std::cout << name_ << ": Begin eating" << std::endl;

        Think();
        
        std::lock_guard<std::mutex> locka(fork[holdForkIndex_]);
        std::cout << name_ << ": Hold fork: " << holdForkIndex_ << std::endl;
        holdForkIndex_ = (holdForkIndex_ + 1) % MAX_FORKS;

        Think();

        std::lock_guard<std::mutex> lockb(fork[holdForkIndex_]);
        std::cout << name_ << ": Hold fork: " << holdForkIndex_ << std::endl;
        holdForkIndex_ = (holdForkIndex_ + 1) % MAX_FORKS;

        // eating
        std::this_thread::sleep_for(std::chrono::microseconds(10));
        
        std::cout << name_ << " End eating" << std::endl;
    }
};

void DiningPhilosophers()
{
    Philosopher s("Socrates", 0);
    Philosopher n("Nietzsche", 1);
    
    std::thread sEat(&Philosopher::Eat, &s);
    std::thread nEat(&Philosopher::Eat, &n);
    
    sEat.join();
    nEat.join();
}

/* this is the main() */
void Deadlocks_main()
{
    DiningPhilosophers();
}

We have two philosophers and two forks. In each Eat() function we follow the order:

  1. Think
  2. Grab a fork
  3. Think
  4. Grab another fork
  5. Eat
  6. Release both forks

Here’s the output on my machine, before the program just hangs:

Socrates: Begin eating
Nietzsche: Begin eating
Socrates: Hold fork: 0
Nietzsche: Hold fork: 1

As you can see, each philosopher acquires a fork and then just waits indefinitely. One way to break this deadlock is to use std::lock() function. What this function does is that it provides a functionality to lock one or more mutexes at once if it can, otherwise it locks nothing.

void Eat()
{
	std::cout << name_ << ": Begin eating" << std::endl;

	int lockAIndex = holdForkIndex_;
	int lockBIndex = (holdForkIndex_ + 1) % MAX_FORKS;
	
	std::lock(fork[lockAIndex], fork[lockBIndex]);
	
	Think();
	
	std::lock_guard<std::mutex> locka(fork[lockAIndex], std::adopt_lock);
	std::cout << name_ << ": Hold fork: " << lockAIndex << std::endl;

	Think();

	std::lock_guard<std::mutex> lockb(fork[lockBIndex], std::adopt_lock);
	std::cout << name_ << ": Hold fork: " << lockBIndex << std::endl;

	// eating
	std::this_thread::sleep_for(std::chrono::microseconds(10));
	
	std::cout << name_ << " End eating" << std::endl;
}

First, the philosopher tries to lock both the forks. If successful, then they proceed further to eat, otherwise they just wait.

Notice two things here, first of all we’re not using any explicit loops to lock-test-release mutex. All of that is handled by std::lock(). Second, we’re providing the extra parameter std::adopt_lock as a second argument to std::lock_guard. This gives a hint to std::lock_guard to not lock the mutex at the construction, as we’re using the std::lock() for locking mutexes. The only role of std::lock_guard() here is to guarantee an unlock when the Eat() goes out of scope.

Here’s the output for the above code:

Socrates: Begin eating
Nietzsche: Begin eating
Socrates: Hold fork: 0
Socrates: Hold fork: 1
Socrates End eating
Nietzsche: Hold fork: 1
Nietzsche: Hold fork: 0
Nietzsche End eating

Deadlocks with single lock

Deadlocks happen more often when more than one mutex is involved. But, that doesn’t means that deadlocks can’t happen with a single mutex. You must be thinking, how much stupid one has to be just using a single mutex and still deadlocking the code. Picture this scenario:

Homer Simpson works at Sector 7-G of a high security Nuclear Power Plant. To avoid the unproductive water-cooler chit-chat, their boss Mr. Burns has placed the water cooler inside a highly protected cabin which is protected by a passcode, allowing only single employee in at a time. Also, to keep track of how much water is being consumed by each employee, Mr. Burns has equipped the water cooler with mechanism that every employee needs to swipe their card in order to use it.

Homer is thirsty, so he decides to get some water. He enters the passcode and is inside the protected chamber. Once there, he realizes he’s left his card at his desk. Being lazy, he decides to call his friend Lenny to fetch his card. Lenny goes to Homer’s office, grabs his card and goes straight to the water cooler chamber, only to find it locked from the inside.

Now, inside Homer is waiting for Lenny to get his card. While, outside Lenny is waiting for the door to get unlocked. And we have a deadlock situation.

Here’s a representation of above problem in code

std::mutex m;

void SwipeCard()
{
    std::cout << "Card swiping ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Card swiped" << std::endl;
}

void GetWater()
{
    std::cout << "Chamber unlocking ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Chamber unlocked" << std::endl;

    SwipeCard();

    std::cout << "Water pouring" << std::endl;
}

void Sector7GSituation()
{
    GetWater();
}

And here’s the output before the program hangs:

Chamber unlocking ... 
Chamber unlocked
Card swiping ... 

The moral of the story is, calling of user code after acquiring a lock should be handled with uttermost care. For example, in above code calling SwipeCard() after acquiring a lock is a big clue that there are some design errors with this code.

The solution is to the restructure the code with something like:

std::mutex m;

void SwipeCard()
{
    std::cout << "Card swiping ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Card swiped" << std::endl;
}

void EnterChamber()
{
    std::cout << "Chamber unlocking ... " << std::endl;
    std::lock_guard<std::mutex> waterCoolerLock(m);
    std::cout << "Chamber unlocked" << std::endl;
}

void GetWater()
{
    EnterChamber();
    SwipeCard();

    std::cout << "Water pouring" << std::endl;
}

void Sector7GSituation()
{
    GetWater();
}

Output:

Chamber unlocking ... 
Chamber unlocked
Card swiping ... 
Card swiped
Water pouring

Deadlocks and libdispatch

Moving over to GCD, we don’t have to worry about such nitty-gritty details, as we don’t have to care about mutex and locks. But still deadlocks can happen, because deadlocks are more of design errors than anything else.

Here’s one example of a deadlock using GCD

let queue:dispatch_queue_t = dispatch_queue_create("deadlockQueue", nil)

func SimpleDeadlock()
{
    dispatch_sync(queue) {
        println("Enter task: 0")
        dispatch_sync(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

SimpleDeadlock()

The jobs sounds simple enough, we have a serial queue. We have two tasks, we need them to be executed sequentially. Each task has a taskId associated with it, just so that we can keep track of what’s getting executed. We use dispatch_sync because we want our tasks to be executed one after the other.

Here’s the output, before it gets deadlocked:

Enter task: 0

So, why isn’t the task 1 getting executed? Well, lets understand how dispatch_sync works. A dispatch_sync blocks the queue until it has been executed fully. Consider the following situation:

Say there is a hair saloon, they’ve a very strict first come first serve policy. A mother and a daughter visit the saloon. The mother steps in first so gets the client id 0 while the daughter gets client id 1. According to the rules, they have to serve the mother first, because she has the lowest numbered client id. But, the mother insists that they first serve her daughter. If they start serving the daughter they’re actually breaking the rule, as then they would be serving client 1 before 0. Hence, the deadlock.

And this is exactly why task 1 is not getting initiated, because the serial dispatch_queue is waiting for the task 0 to finish. But, the task 0 is insisting the queue to finish the task 1 first.

There are two ways to resolve this deadlock. First is using a concurrent queue instead of a serial queue.

let queue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)

func SimpleDeadlock()
{
    dispatch_sync(queue) {
        println("Enter task: 0")
        dispatch_sync(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

SimpleDeadlock()

Output

Enter task: 0
Enter task: 1
Exit task: 1
Exit task: 0

The reason why this works is that a concurrent queue is allowed to work on more than one submitted task at a time. So, first task 0 is submitted. Task 0 then submits task 1 and demands it to be executed synchronously or in other words to be finished before finishing task 0. Since this is a concurrent queue, it’s distributes its time over all the queued tasks.

In terms of our mother-daughter-and-the-saloon example above. Lets say the saloon has a big clock that ticks every 1 minute. For that 1 minute they focus on a single client and then at the next tick they switch to another client, irrespective of what state the client is in.

So at first tick they focus on the mother, the mother just sits there not allowing them to even touch her hairs before her daughter is done. At next tick, they switch to the daughter and start serving her. This repeats until the daughter is fully done, then the mother allows them to work on her hairs.

This works, but it has a problem. Remember, we wanted the task 0 to be fully completed before task 1. That’s probably why we thought of using serial queues and synchronous tasks. This solution actually breaks it. What we actually need is a serial queue with tasks asynchronously submitted.

let queue:dispatch_queue_t = dispatch_queue_create("deadlockQueue", nil)

func SimpleDeadlock()
{
    dispatch_async(queue) {
        println("Enter task: 0")
        dispatch_async(queue) {
            println("Enter task: 1")
            println("Exit task: 1")
        }
        println("Exit task: 0")
    }
}

XCPSetExecutionShouldContinueIndefinitely(continueIndefinitely: true)

SimpleDeadlock()

Output:

Enter task: 0
Exit task: 0
Enter task: 1
Exit task: 1

This solution submits task 1 after task 0 to a serial queue. Since, this is a serial queue, the task ordering is preserved. Also, since tasks do not depend on each other, we never see any deadlocks.

The lesson here is to use dispatch_sync very carefully. Remember deadlocks are always design errors.

The code for todays experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, dev | Tagged , , , , | Comments Off on Concurrency: Deadlocks

Concurrency: Introducing Inter-Thread Communication

Picture the scenario: You’re on the phone with your friend, when suddenly he say ‘Hey! Hold on. I think I’m getting another call’ and before you can say anything, he puts you on hold. Now, you have no idea whether to hang up, or wait. Your friend could be coming back in 3 seconds or 3 hours you’ve no idea. Wouldn’t it be great if your phone service provided you a mechanism that whenever you’re put on hold, you can freely disconnect and whenever your friend disconnects the other call, you immediately get a ring. Let’s talk about managing synchronization between threads.

On a broader level the above is a classical producer-consumer problem. Both consumer and producer are synchronized by a shared data. What we actually need is a way to communicate between the producer and the consumer. The producer needs to signal the consumer the data is ready for consumption.

So far in all our concurrency experiments whenever we’ve some shared data, we explicitly block one or more threads while one thread gets to play with the resource. The shared data here seems to be a boolean flag of some sort. But we can do better with std::condition_variable

    std::mutex mutex;
    std::condition_variable condition;

    void StartConsumer()
    {
        std::cout << std::this_thread::get_id() << " Consumer sleeping" << std::endl;
        std::unique_lock<std::mutex> consumerLock(mutex);
        condition.wait(consumerLock);
        consumerLock.unlock();
        
        std::cout << std::this_thread::get_id() << " Consumer working" << std::endl;
    }
    
    void StartProducer()
    {
        std::cout << std::this_thread::get_id() << " Producer working" << std::endl;
        std::this_thread::sleep_for(std::chrono::seconds(1));
        std::lock_guard<std::mutex> producerLock(mutex);
        condition.notify_one();
        std::cout << std::this_thread::get_id() << " Producer sleeping" << std::endl;
    }
    
    void StartProcess()
    {
        std::thread consumerThread(StartConsumer);
        std::thread producerThread(StartProducer);
        
        consumerThread.join();
        producerThread.join();
    }

Output

0x100ced000 Consumer sleeping
0x100d73000 Producer working
0x100d73000 Producer sleeping
0x100ced000 Consumer working

In the above code both the consumer and producer execute simultaneously in concurrent threads. The consumer waits for the producer to finish working on whatever it was doing. But instead of continuously checking for whether the producer is done as in

while (!done) {
	std::unique_lock<std::mutex> consumerLock(mutex);
	consumerLock.unlock();
	std::this_thread::sleep_for(std::chrono::microseconds(1));
}

We make use of condition variable to inform us whenever the producer is done. The producer then using notify_one() (or maybe notify_all()) sends a notification to all the waiting threads to continue.

Moving to GCD, we’ve two ways. First is to use dispatch_group.

let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)
let group:dispatch_group_t = dispatch_group_create()

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_group_wait(group, DISPATCH_TIME_FOREVER);
    println("Consumer working")
}

func StartProducer()
{
    dispatch_group_async(group, concurrentQueue) { () -> Void in
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Output:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

The first thing that you’ll notice in this code is that we start the producer code first and then the consumer waits. This is because if we start the consumer first, the queue has nothing in it to wait for.

dispatch_group is actually meant to synchronize a set a tasks in a queue. For example, you’re loading the next level in your game. For that, you need to download a lot of images and other data files from a remote server before finishing the loading. What you’re doing is synchronizing a lot of async tasks and wait for them to finish before doing something else.

For this experiment, we can better use a dispatch_semaphore. dispatch_semaphore is a functionality based on the classic semaphores concepts you must’ve learned during your college days.

let sema:dispatch_semaphore_t = dispatch_semaphore_create(1);

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    println("Producer working")
    sleep(1)
    println("Producer sleeping")
    dispatch_semaphore_signal(sema)
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

If you look closely, we haven’t even used any dispatch_queue here. This is because dispatch_semaphore is more about synchronization than about threading.

dispatch_semaphore is basically just a count based blocking system. We just create a dispatch_semaphore object by telling it how many counts can be decremented before it blocks. Then on each wait it decrements the count and on each signal it increments the count. If the count reaches less than 0, the thread is blocked until somebody issues a signal.

Here’s another example to illustrate on using semaphores between two queues

let sema:dispatch_semaphore_t = dispatch_semaphore_create(0);
let concurrentQueue:dispatch_queue_t = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0)

func StartConsumer()
{
    println("Consumer sleeping")
    dispatch_semaphore_wait(sema, DISPATCH_TIME_FOREVER)
    println("Consumer working")
}

func StartProducer()
{
    dispatch_async(concurrentQueue) {
        println("Producer working")
        sleep(1)
        println("Producer sleeping")
        dispatch_semaphore_signal(sema)
    }
}

func StartProcess()
{
    StartProducer()
    StartConsumer();
}

Again note that we’re starting producer first, and because we initialize the dispatch_semaphore with 0, the consumer is going to block the main thread as soon as it is invoked. It won’t even give the chance to run the producer code in parallel.

Here’s the output of above code:

Producer working
Consumer sleeping
Producer sleeping
Consumer working

This is just an scratch of the surface of inter-thread communication. Most of problems with concurrency actually somehow deal with inter-thread communication. In later sessions we shall experiment more on this. The idea for this article came from the this discussion on problem of loading game data in a parallel thread and the solution I worked on.

The code for today’s experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, dev | Tagged , , | Comments Off on Concurrency: Introducing Inter-Thread Communication

Concurrency: Thread-safe Singletons

Love them or hate them, you just can not ignore singletons. Let’s try experimenting with singletons and concurrency.

Singletons are objects that you wish to have only a single instance all over your program lifecycle. Usually, you want to create singleton objects lazily. Here’s a typical way you can create a singleton in a single threaded application.

class Library {
public:

    static Library *SharedInstance()
    {
        if (!shared_) {
            shared_ = new Library();
        }
        return shared_;
    }

    static void Dealloc()
    {
        if (shared_) {
            delete shared_;
        }
    }
    
    Library() :
    books_({
        {"Ernest Hemingway", "Old man and the sea"},
        {"Aldous Huxley", "Brave new world"},
        {"Aldous Huxley", "The Genius and the Goddess"},
        {"Aldous Huxley", "Antic Hay"},
        {"Salman Rushdie", "Midnight's children"},
        {"Fyodor Dostovesky", "Crime and punishment"},
        {"Boris Pasternak", "Doctor Zhivago"},
        {"Bernard Shaw", "Arms and the Man"},
        {"Bernard Shaw", "Man and Superman"}
    })
    {
        std::cout << "Library " << this << std::endl;
    }
    
    ~Library()
    {
        std::cout << "~Library " << this << std::endl;
    }
    
    std::vector<std::string> GetBooks(const std::string &author) const
    {
        typedef std::multimap<std::string, std::string>::const_iterator BookIt;
        std::pair<BookIt, BookIt> range =  books_.equal_range(author);
        std::vector<std::string> bookList;
        for (BookIt it = range.first; it != range.second; ++it) {
            bookList.push_back(it->second);
        }
        return bookList;
    }
    
    std::size_t GetSize() const
    {
        return books_.size();
    }
    
private:
    std::multimap<std::string, std::string> books_;
    static Library *shared_;
};

Library *Library::shared_ = nullptr;

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance()->GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance()->GetSize() << " books" << std::endl;
}

When invoked with:

void SerialImpl()
{
    std::string searchAuthor("Aldous Huxley");
    PrintBooks(searchAuthor);
    PrintSize();
}

void ThreadsafeSingleton_main()
{
    SerialImpl();
    Library::Dealloc();
}

We get

Searching for author: Aldous Huxley
Library 0x100a6bfe0
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
~Library 0x100a6bfe0
main() exit

We see that our singleton actually gets created lazily when actually required and also gets deallocated when the program finally quits, thanks to the Dealloc() function.

Next, lets try calling it from multiple threads.

void ConcurrentImpl()
{
    std::string searchAuthor("Aldous Huxley");
    std::thread bookSearchThread(PrintBooks, std::ref(searchAuthor));
    std::thread libSizeThread(PrintSize);
    
    bookSearchThread.join();
    libSizeThread.join();
}

This might actually work or not depending on how the threads are switching on your machine. To see what is problem with this code, let’s deliberately sleep the thread while creating the singleton instance.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

Now, if you run it you might see two instances of Library objects getting created.

Searching for author: Aldous Huxley
Library Size: Library 0x100b84fe0
Brave new world
The Genius and the Goddess
Antic Hay
Library 0x100ba4fe0
9 books
~Library 0x100ba4fe0
main() exit

There are two problems here. First of all, we’ve two instances of the Library object and second, because our Dealloc routine depends on the guarantee that at any given moment we have a single instance of the Library object, so the other Library object never gets released. This is a classic example of a memory leak.

We can solve this issue with two techniques. First is using the C++11’s static initialization.

Let’s modify our Library class and instead of creating the singleton at heap, we create a static instance of a Library object.

static Library &SharedInstance()
{
    static Library lib;
    return lib;
}

And modifying our calls

void PrintBooks(const std::string &author)
{
    std::cout << "Searching for author: " << author << std::endl;
    std::vector<std::string> list = Library::SharedInstance().GetBooks(author);
    std::copy(list.begin(), list.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}

void PrintSize()
{
    std::cout << "Library Size: " << Library::SharedInstance().GetSize() << " books" << std::endl;
}

If we now run our code, we get

Searching for author: Aldous Huxley
Library 0x10006d3c8
Brave new world
The Genius and the Goddess
Antic Hay
Library Size: 9 books
main() exit
~Library 0x10006d3c8

This works because with C++11 we’re now guaranteed to have the static variables created thread-safely. Another benefit with this approach is that we don’t have to implement any Library::Dealloc() function as static data gets deallocated automatically by the runtime. If this satisfies your requirements, this is best way out.

But, this won’t be the best fit for some cases. Say, you want to have Library singleton object only in heap memory. Also, if you’re using Objective-C this won’t work as you’re not allowed to allocate memory of custom objects on the stack. This brings us to our second method: Using C++ thread library.

Let’s switch back to our original implementation, with Library object in heap memory. The first thing that we can try is using mutex.

    static Library *SharedInstance()
    {
        std::lock_guard<std::mutex> lock(mutex_);
        
        if (!shared_) {
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        }
        return shared_;
    }

And this works as expected:

Searching for author: Aldous Huxley
Library Size: Library 0x100b92fe0
Brave new world
The Genius and the Goddess
Antic Hay
9 books
~Library 0x100b92fe0
main() exit

The only problem with this code is that Library::SharedInstance() is going to be called from plenty of places, and this code unnecessary lock and unlock the mutex at each call. What we actually need it the mutex only for cases when the single instance has not been created, all other times we can just simple pass the pointer back. This brings us to the infamous double checking lock design.

    static Library *SharedInstance()
    {
        if (!shared_) {
            std::lock_guard<std::mutex> lock(mutex_);
            if (!shared_) {
                std::this_thread::sleep_for(std::chrono::seconds(1));
                shared_ = new Library();
            }
        }
        return shared_;
    }

This code looks great right? It applies the mutex only when creating the shared instance, every other time it just returns the pointer. But this has a serious bug. This code would work almost all the time, except when you’ve forgotten about it. It’s so hard to debug that you probably can’t even simulate the issue in the debug environment and unfortunately you won’t ever see this bug again for probably a million clock cycles. This sounds like a programmer’s nightmare right?

Suppose you’re running this code with two thread ‘bookSearchThread’ and ‘libSizeThread’. bookSearchThread comes in and sees the shared_ is NULL. It would acquire the lock and start allocating and initializing the instance. bookSearchThread is half-way done, it has allocated a space in the heap, so the shared_ isn’t NULL anymore, but bookSearchThread is not done initializing the instance yet, just the allocation. Something like in Objective-C as:

Library *shared = [[Library alloc] init];

Suddenly, the thread switching happens and libSizeThread comes in. It sees the shared_ isn’t NULL. It says OK, good somebody must have already created this instance, lets use it straightaway. This could result in undefined behavior, as libSizeThread is now using an object that isn’t actually fully initialized.

This is such a great problem that almost every thread library provides a way to make this work. In C++11, we can use the std::call_once

class Library {
private:
    static std::once_flag onceFlag_;
    /* other data members */
    
    public:
    static Library *SharedInstance()
    {
        std::call_once(onceFlag_, []{
            std::this_thread::sleep_for(std::chrono::seconds(1));
            shared_ = new Library();
        });
        return shared_;
    }
    
    /* other member functions */
}

Even GCD provides something similar:

+ (instancetype)sharedInstance
{
    static dispatch_once_t onceFlag;
    static id shared_;
    dispatch_once(&once, ^{
        shared_ = [[[self class] alloc] init];
    });
    return shared_;
}

These helper objects from standard libraries are provided just to guarantee that the object allocation and initialization is complete before setting the flag and every other time, it just returns immediately. So, no more race conditions for us.

So there it is, singletons the thread-safe way.

The code for todays experiment is available at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, dev | Tagged , , , | Comments Off on Concurrency: Thread-safe Singletons

Concurrency: Working with shared data using threads

In continuation with the last week’s experiment on protecting shared data in a concurrent system, let’s look at how we can make it work using C++ and threads.

Here’s a brief recap: We’re building a Airport scenario, where we have a single status board screen. The status board is simultaneously being edited by multiple airliners. The status board can be read by any passenger at any given moment for any flight status based on the flight number.

Let’s build the concept within a single-threaded environment:

class StatusBoard {
    std::forward_list<int> statusBoard;
    
public:
    void AddFlight()
    {
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        return (std::find(statusBoard.begin(), 
                          statusBoard.end(), 
                          flightNum) != statusBoard.end());
    }
};

StatusBoard statusBoard;

void StatusWriter()
{
    debugCount.write++;
    statusBoard.AddFlight();
}

bool StatusReader(const int searchFlightNum)
{
    debugCount.read++;
    return statusBoard.FindFlight(searchFlightNum);
}

void SerialLoop()
{
    while (!StatusReader(SEARCH_FLIGHT)) {
        StatusWriter();
    }
}

On my machine, it outputs:

Flight found after 26 writes and 27 reads.

Now let’s begin with making this implementation multithreaded.

void StatusWriter()
{
    std::this_thread::sleep_for(std::chrono::seconds(1));
    
    debugCount.write++;
    statusBoard.AddFlight();
}

void ParallelLoop()
{
    while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {
        /* write operation */
        std::thread writeOp(StatusWriter);
        writeOp.detach();
    }
}

The good thing about C++ is that much of the code looks almost similar to the single threaded code and that is due to the way the thread library has been designed. At first glance you won’t be even able to tell where the threading code actually is. Here we’re invoking two operations on two threads at each iteration of the loop.

First we have a read operation

while (!std::async(std::launch::async, StatusReader, SEARCH_FLIGHT).get()) {

This makes use of std::future to invoke the StatusReader function on a async thread. The get() call at the end actually start the process. We shall talk about futures and promises in some future experiment. In this experiment it would be suffice to understand that a call to std::async() starts the StatusReader() function in a concurrent thread that magically returns back the result with the get() to the caller thread.

Another is simply spawning of a std::thread using the direct use with detach that we’re already familiar of from our previous experiment.

Here we just start the StatusWriter function on a new thread every time the loop iterates.

/* write operation */
std::thread writeOp(StatusWriter);
writeOp.detach();

On my machine, when the code doesn’t crashes, it prints:

Flight found after 26 writes and 1614 reads.

Obviously, this code isn’t thread-safe. We’re reading and writing to the same forward list from more than one threads, and this code is supposed to crash a lot. So, next step let’s make it thread safe using a mutex.

A mutex is the basic object that is used to provide mutual exclusivity to a portion of code. The mutex in C++ are used to lock a shared resource.

First of all we start with a std::mutex object. This object guarantees that the code is locked for a single thread use at a time. We can use the lock() and unlock() member functions on std::mutex object like:

void foo()
{
    std::mutex mutex;
    
    mutex.lock();
    
    /* 
     * some resource locking code here
     */
    
    mutex.unlock();
}

If you’ve used C++ long enough, you’ll know that this code is prone to all sort of errors. In many cases the foo() can exit before the unlock() gets executed. This could be due to any early returns that you have added to the function. Or, in a more likely scenario, your code does something illegal and the foo() throws some exception. Even if foo() is very robust, it could call some another function which isn’t that robust.

If you’re an old time C++ user, you must know that to solve this problem space we often use RAII idiom. Good for you, the standard thread library already provides this for you in form of a std::lock_guard. The std::lock_guard is designed to work with any lock like object.

Using std::mutex and std::lock_guard our code is now:

class StatusBoard {
    std::forward_list<int> statusBoard;
    mutable std::mutex mutex;
    
public:
    void AddFlight()
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        int newFlight = rand()%MAX_FLIGHTS;
        statusBoard.push_front(newFlight);
    }
    
    bool FindFlight(const int flightNum) const
    {
        std::lock_guard<std::mutex> lock(mutex);
        
        return (std::find(statusBoard.begin(), statusBoard.end(), flightNum) != statusBoard.end());
    }
};

The mutable keyword is used as we’re using the mutex in a const member function FindFlight(). On my machine it executes as:

Flight found after 10 writes and 1591 reads.

Sometime, the code also throws the following exception after exiting from main:

libc++abi.dylib: terminating with uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument

This is due to the fact that we’ve explicitly added the thread delay for debugging and some of out async threads are executing as detached and might not be over before the main exits. I don’t think this is going to be a problem in any real code.

So, there, we’ve a basic thread-safe code. This is not the best implementation possible. There are many things to look out for. Like, we’ve to take care of not pass out any references or pointers of shared data, otherwise the code might become unsafe again. Always remember that at the low level all threads share the same address space, and whenever we’ve a shared resource without a lock, we have a race condition. This is more of the design error from the programmer’s perspective.

As always this code is available online at github.com/chunkyguy/ConcurrencyExperiments. Check it out and have fun experimenting on your own.

Posted in concurrency, dev | Tagged , | Comments Off on Concurrency: Working with shared data using threads