Machine learning: analyzing images with Amazon Rekognition Custom Labels

-

Artificial Intelligence (AI), machine learning and big data are concepts that sound a bit alien but are around us for some time now. Time for me to dive into it!

In this blog I’ll explain my first steps in the machine learning world, more specific the Amazon Rekognition Custom Labels environment. Why this one? Amazon Rekognition offers some easy to enter examples and models to get you started without diving deep into machine learning. At the moment Amazon has showcases for object and scene detection, image moderation, facial analysis, text in image and face comparison including celebrity recognition.

Picture of a phone scanning a qr code 

Custom Labels is the service I used because I work with images unique to my (demo) business. The idea is to use image recognition to load a specific part of a website, prefilled with data based on an image. Machine learning is used to train a model to receive correct information from the image.

-

The case

Company Van Dam checks and maintains industrial doors on platforms and vessels. Every check involves a long and difficult form that needs to be filled in, depending on the status of a door, a lock or a tag plate. Pictures are taken to record the situation. The circumstances at those locations make it hard to fill in those forms easily and the error rate is quite high.

A solution could be to let the Van Dam’s employee take a picture of the object at location. In the background the picture gets uploaded to the cloud, a machine learning model fires to classify the object. Data is sent to the employee’s maintenance app and the correct form is opened and prefilled based on the received information.

I will explain the steps I took to create a dataset and get a trained model – the other parts of the case are not considered in this blog.

  • Note 1: in this demo I used my AWS admin account. When using AWS in case of a project you need to adjust the security settings conform the specifications of that project.
  • Note 2: company Van Dam has a database full of images (of mixed quality but that makes this experience realistic and interesting). A a lot of images are needed to get a properly trained model.

Set up

To be able to use the services from Amazon you need to have an AWS account. I won’t go into detail how to set up accounts and permissions – Amazon has an excellent online manual to help you out.

Services that I used:

  • AWS IAM
  • Amazon S3
  • AWS CLI
  • Amazon Rekognition Custom Labels

AWS IAM Via the AWS Management Console you find the IAM service in section Security, Identity, & Compliance. You need to create a group and a user in that group with sufficient rights.

I created a group ‘ml-administrators’ with ‘AdministratorAccess’ permissions and added a user called ‘ml-administrator’. The user has ‘IAMUserChangePassword’ permissions.

Amazon S3 Amazon Rekognition creates a S3 bucket to hold training images and test images. During my demo project I used another S3 bucket from which I imported images. Permissions need to be set correctly.

AWS CLI Via the AWS Documentation you can read the instructions to install the AWS CLI. I used version 2. Follow the steps as described and verify your installation with ‘$ aws –version’ in the shell.

Preparing the model

Step one was to start a project, and I named it ‘VanDamDoors’. Next I created a dataset for training with the simple name ‘Doors’. To get images in the dataset I chose for ‘upload images from your computer’ and after submitting them Amazon directly warned with ‘your dataset must at least have 2 labels’.

Labeling

How to do that was explained on that page too: every image needs a label and a bounding box to highlight exactly where that label shows up in the image.

Ok. Before I uploaded the images, I had made a decision about the labels I wanted to use. The employee has to get information about the front side of a door, the back side of a door, a lock or a tag plate. This is how the pictures in the Van Dam’s database are structured and I uploaded 5 images of each category. So then I came up with these 4 labels: ‘front, ‘back, ‘lock’, ‘tagPlate’. I created the labels in my dataset in Rekognition and assigned them to the correct image.

After that I opened every image to draw a bounding box in it, covering the spot where the door, lock or tag plate was located, and saved the changes.

Training & Testing

Ready to train the model! When you don’t see a big orange button with ‘train model’ on it, you need to exit the labelling modus or refresh the page. For creating a test dataset I choose the option to split the training dataset. And then … I had to wait. For quite a long time. But then there were the results:

I uploaded 20 images in total, from which Rekognition used 16 images for my training set, and 4 images for the test set. In the test set only one image was tested ‘true positive’, two others were ‘false negative’ and the other one was ‘false positive’. Disappointing.

Of course the amount of images was too small for a reliable test, but it made me wonder if the bounding box method is the right was to do this. And it takes some time to do this manually for a lot of images.

Meanwhile Rekognition has uploaded all data to the S3 bucket it created on my behalf. A bucket with a long name, in my case ‘custom-labels-console-eu-west-1-f79af322a7’. It holds the training images and testing images per dataset, to be found in the folder ‘assets’. Folder ‘datasets’ contains the metrics per dataset in subfolder ‘custom-labels’.

Using Amazon Cloud Storage S3

I was convinced that there was another way to do this and I also wanted to use more than 20 images to train my model. I decided to upload all images in the S3 bucket that Rekognition uses. On the same level as the ‘datasets’ folder, directly in the root folder, I created a folder called ‘input’. Feel free to use another name – this is just for demo purposes.

In the ‘input’ folder I created 4 subfolders representing the labels of my new dataset. I used the same names as before (‘front, ‘back, ‘lock’, ‘tagPlate’) to be able to compare the results. Every subfolder contained the images that belong to that label, in my case around 326 images per label.

Re-training & Testing

In Amazon Rekognition I created a new dataset named ‘Doors2’ and I imported the images from the S3 bucket folder I just created.

Note: the backslash at the end of the URL (s3://custom-labels-console-eu-west-1-f79af322a7/input/) is very important – don’t forget this otherwise Rekognition keeps complaining about an ‘invalid folder location’.

To get the test images I let Rekognition split the dataset like I did before.

Then I trained this model and in less than a second it was ready! A significant difference with my previous experience, and this time I had my labels in automatically (taken from the folder name).

Results

In total I uploaded 1306 images. I used 1,043 images for the training set and 263 test images from which 258 images resulted in a ‘true positive’, 5 in a ‘false positive’ and also 5 in a ‘false negative’. Quite good result since a lot of images are not very clear. And again all locks and tag plates turned out to be marked as positive.

Quick peek in the S3 bucket: when opening the ‘evaluation’ folder (root) and making your way through the subfolders, you will find some json files with a reference to Amazon SageMaker Ground Truth. That’s the fully managed Amazon data labeling service that makes it easy to build highly accurate training datasets for machine learning.

Conclusion

Automatic labelling via uploading a folder structure is much more efficient with a lot of images. A lot of images are needed to properly train a model. I don’t know what a bounding box adds to this process. User experience in Rekognition can definitely be improved!

Using the model

Now we have the model set up, we can start to use it. To see if the model recognizes a random picture of an industrial door, I searched for an image on the internet. I put this image in a separate S3 bucket called ‘custom-labels-input-images’.

To get the model running I used the AWS CLI. In Rekognition the link to the latest model (for me ‘VanDamDoors.2020-05-14T10.30.41’) in the section ‘Projects’ opens a page with the API info. Scroll to the end of the page and click on the arrow to view the code.

In my Terminal (AWS CLI command) I started my latest model with:

aws rekognition start-project-version \
–project-version-arn “arn:aws:rekognition:eu-west-1:938506314627:project/VanDamDoors/version/VanDamDoors.2020-05-14T10.30.41/1589445041847” \
–min-inference-units 1 \
–region eu-west-1 –profile Lydia

I had to add ‘–profile Lydia’ for AWS to use the correct account. Error ‘The security token included in the request is invalid.’ will show up if AWS doesn’t recognize the profile.

The model starts when you get this feedback:

{ “Status”: “STARTING” }

Time to get the test image involved, and see how accurate the model is:

aws rekognition detect-custom-labels \
   –project-version-arn “arn:aws:rekognition:eu-west-1:938506314627:project/VanDamDoors/version/VanDamDoors.2020-05-14T10.30.41/1589445041847” \
   –image ‘{“S3Object”: {“Bucket”: “custom-labels-input-images”,”Name”: “DizRvKxWsAALTiY.jpeg”}}’ \
   –region eu-west-1 –profile Lydia

Notice the reference to the bucket (‘custom-labels-input-images‘) and the image file ‘DizRvKxWsAALTiY.jpeg’.

Feedback I received:

{ “CustomLabels”: [
     {"Name": "front",
     "Confidence": 87.34600067138672}
]}

Nice! It recognized the image as a front door view with 87 percent confidence.

Note: don’t forget to stop the model otherwise the costs will pile up. Use the API code to stop the model and check the feedback in the command line:

    { "Status": "STOPPING" }

Double check the Rekognition ‘Projects’ overview section that the model indeed has stopped.

Conclusion

Using the model is easy as 1-2-3. It is very clear about the accuracy and it is easy to compare it with the other test results.

Next steps

Next steps in this demo case will be to use the data to open the correct form and prefill it. For example, when the input image (taken by the employee) is recognized as a lock with a certainty of at least 85% the lock information page should be opened.

In short

Take picture > upload picture to S3 > start model > analyze image > receive label and confidence > stop model > open correct page > prefill form.

At the moment Amazon Rekognition only gives the option to use the model via the API code in AWS CLI command. Python is coming soon, it says.

Some other features that can complete the app and what can be done with Amazon Rekognition are:

  • Escape route if the picture is not a lock
  • Facial recognition of the employee to prefill personal data
  • Get date time when picture is taken
  • Get text from image when it is a tag plate

Conclusion

Amazon Rekognition is a service that offers handy features that can be easily used. Custom Labels targets specific objects and scenes used by your business. User experience can be improved to speed up the process, in my opinion. Using the API via Python is not an option at the moment – hopefully this will come soon. Since everything is cloud based you need to keep an eye on the costs when starting the model and using a lot of pictures. Don’t forget to stop the model when ready.