iOS 11 was announced in WWDC 2017 and available to download now. One of the most inspiring features is to leverage machine learning in different levels. The direct advantage for our developers is Core ML. In this article, I will introduce Core ML with an example to detect objects in image.

Simply speaking, there are two phases in complete machine learning development: training with large-scale data to get model and use model to predict with new input data. Core ML works in the second phase.

Here is a simple figure to depict the complete process.

ML

I will not cover the basic knowledge in machine learning. You can find more on the Internet. Model is the most import product from machine learning algorithm and data. Core ML defines its own format for model file. So the models we get with machine learning frameworks need to be converted. Fortunately, Apple provide Core ML Tools to help us. This is not in our main topic, so please refer to its document.

There are already different models to apply to different areas like computer vision and language processing. Apple has provided several popular models including Inception v3, Places205-GoogLeNet, ResNet50 and VGG16 in its ow n format. Let’s start an example with model Inception v3 which is popular for detecting objects in image.

First please download Inception v3’s model file from https://developer.apple.com/machine-learning/.

Then drag it to our new project. In Xcode, we can see the information about this model.

ModelInfo

In the first part “Machine Learning Model”, there are basic information like name, author, description, etc.

Xcode will generate Swift code for this model which is shown in Model Class. Click the right arrow, we’ll get the class definitions for input, output and the model in Swift. Dive in if you want to learn more.

The inputs data and the outputs result formats are shown in the third part “Model Evaluation Parameters”. For Inception v3, the input is an image. The output is probability for each category as a dictionary and the most likely category as String.

For simplicity’s sake, we’ll use an static image. Please resize the image to 299*299 because it’s Inception v3’s default setting. Our new project is very simple. We only place an image view to preview the image and an label to show the results.

Let’s see the codes.

 override func viewDidLoad() {
        super.viewDidLoad()

        let model = Inceptionv3()

        let inputImage = UIImage(named: "sample.jpg")!.cgImage!
        let pixelBuffer = getCVPixelBuffer(inputImage)

        guard let pb = pixelBuffer, let output = try? model.prediction(image: pb) else {
            fatalError("Unexpected runtime error.")
        }
        result.text = output.classLabel
    }

In the codes, we initialize an Inception v3 model. This class is generated by Xcode. In the second step, we read the input image and convert it to CVPixelBuffer with the helper function getCVPixelBuffer. At last we get the result by model’s prediction function. It’s so easy with the help of Xcode. I’ve uploaded the complete project to GitHub. Please first download Inception v3 model file from https://developer.apple.com/machine-learning/ and put in in project if you want to run it.

Result

We only show the best result in output’s classLabel. If you want to see all probability values, print output’s classLabelProbs somewhere.

 print(output.classLabelProbs)

Thanks for your time.