Getting started with LiteRT (Tensorflow Lite)

So Google recently renamed TensorflowLite to LiteRT. And yes that was a genius move indeed. Because now for the first time in my life I actually want to try TFLite … yea, I mean LiteRT.

Setup

In the real world you’d ideally think like a regular ML developer and start with discovering a dataset that then you’d use to train a model. And then as a next step you’d invent a problem that could be solved by your trained model.

But for this experiment we are going to keep things simple and build the hello world of Machine Learning universe, the Dogs vs Cat exercise.

Setup

So coincidentally enough I’ve found a trained model from a flutter project that can tell if a given photo is of a dog or a cat.

Nice! So then following the instructions at the official getting started docs we need to add tensorflowlite as a dependency to our project.

use_frameworks!
# pod 'TensorFlowLiteSwift'
pod 'TensorFlowLiteSwift', '~> 0.0.1-nightly', :subspecs => ['CoreML', 'Metal']

In case you’re wondering why are we using the nightly build and not the stable release. It’s because as of today the latest stable release of TensorFlowLiteC won’t work on iOS simulator but according to the last comment on the issue looks like TensorFlowLiteC is now also shipped as xcframework but only in the nightly releases.

And then while we wait for the pod install to finish we can shamelessly rip sample images of various dogs and cats from the Introduction to TensorFlow Lite udacity course as our test set.

And we are all set!

Building the app

For the UI we just need a simple image view, a label and a button. The button obviously would reveal the answer and then randomly load the next image.

Setup

Now for the interesting bit. First we need a PetClassifier that takes in an image and returns a text.

class PetClassifier {

  init?(named: String, labels: [String]) {
    // ...
  }

  
  func labelForImage(_ image: UIImage) -> String? {
    // ...
  }
}

And then in the UI layer we can use our PetClassifier to update the label when tapped on the ‘Evaluate’ button.

class ViewController: UIViewController {

  var classifier: PetClassifier?
  @IBOutlet var answerLabel: UILabel!

  override func viewDidLoad() {
    super.viewDidLoad()
    classifier = PetClassifier(named: "dogvscat", labels: ["Cat", "Dog"])
  }

  @IBAction func handleTap() {
    answerLabel.text = classifier?.labelForImage(selectedImage) ?? "Potato"
  }
}

Finally, loading the model is pretty easy. We just need to instantiate Interpreter with the path to our tflite model.

class PetClassifier {
  let interpreter: Interpreter
  let labels: [String]
    
  init?(named: String, labels: [String]) {
    guard let modelPath = Bundle.main.path(forResource: named, ofType: "tflite") else {
      return nil
    }
    
    do {
      var options = Interpreter.Options()
      options.threadCount = Self.threadCount
      interpreter = try Interpreter(modelPath: modelPath, options: options)
      self.labels = labels
    } catch {
      print(error)
      return nil
    }
  }

  // ...
}

Invoking the model

To get the answer from the model is a 4 step process:

  1. Prepare input
  2. Send input data
  3. Read output data
  4. Parse output
/*
 * 1. Prepare input
 */
// user provided image
let image: UIImage 
// image size used for training model
let inputWidth = 224
let inputHeight = 224

// convert image to pixel buffer for further manipulation
let pixelBuffer = ImageUtils.pixelBufferCreate(image: image)
// crop image to size used for training model
let scaledPixelBuffer = ImageUtils.pixelBufferCreateWith(
  pixelBuffer: pixelBuffer,
  resizedTo: CGSize(width: Self.inputWidth, height: Self.inputHeight)
)
// Remove the alpha component from the image buffer to get the RGB data.
let rgbData = ImageUtils.pixelBufferCreateRGBData(
    pixelBuffer: scaledPixelBuffer,
    byteCount: Self.inputWidth * Self.inputHeight * 3
)

/*
 * 2. Send input data 
 */
interpreter.allocateTensors()      
interpreter.copy(rgbData, toInputAt: 0)
interpreter.invoke()

/*
 * 3. Read output data
 */
let outputTensor = try interpreter.output(at: 0)
let results: [Float] = outputTensor.data.withUnsafeBytes {
  Array($0.bindMemory(to: Float.self))
}

/*
 * 4. Parse output
 */      
// Create a zipped array of tuples [(labelIndex: Int, confidence: Float)].
// Sort the zipped results by confidence value
let inferences = zip(labels.indices, results)
  .sorted { $0.1 > $1.1 }
  .map { (label: labels[$0.0], confidence: $0.1) }
      
let bestInference = inferences.first

And there you have it. That is how offload your brain to your computer.

Setup

The ImageUtils from this experiment are available here. But there are probably better libraries for these operations. For example, the CoreMLHelpers

References