IBM Watson Visual Recognition

On February 15, 2011, after the second round of the unique Jeopardy!, the host Alex Trebek declared that the winner of the 1 million dollar 1st place prize is, Watson. But the thing is that Watson is not someone it is something. Watson is a computer program (or a set of programs) created by IBM.

Who is Watson?

On February 15, 2011, after the second round of the unique Jeopardy!, the host Alex Trebek declared that the winner of the 1 million dollar 1st place prize is, Watson. But the thing is that Watson is not someone it is something. Watson is a computer program (or a set of programs) created by IBM.

The interesting thing about Watson is his cognitive abilities. For those who are not familiar with the famous game show Jeopardy!, it is about viewing clues in different categories and the contestants have to say the correct answer in the form of a question. Watson was playing against Jeopardy best players Ken Jennings and Brad Rutter. If you have a background in AI or Machine Learning, you would know that Watson’s mission was extremely difficult for a computer.

 

He had to extract entities from the clues given by the host, identify relations and dependencies between them, recognize what is required as an answer from its knowledge base and finally one of the most difficult tasks, forming a question including the answer keyword. Each of these tasks are divided to sub-tasks and there are PhDs in every single one of them. They are not intuitive or easy. Watson has done pretty well with that.

 

From that on, Watson has advanced astonishingly fast. You can now use Watson in your apps to do all kinds of amazing cognitive tasks.

 

What can Watson Do?

Watson has many cognitive skills/tasks that are greatly useful for this age’s applications. Watson can hold a conversation (both text and voice). If you are amazed with what Siri can do, just wait till you interact with Watson. Watson has tools that can analyze text and extract all kinds of useful information from that text. Watson can deal with both structured and unstructured data. Watson can also be trained to do text analysis in a specific domain. Watson can understand speech and do it as well. Watson can translate between 9 different languages. Watson can actually do all the previous things in all the 9 languages. Watson is even more sophisticated that it can give personality insights depending on your tone and communication style which helps Watson understand your emotions and give other insights about your personality.

 

But more importantly for us here on Imaginghub, is Watson’s Visual Recognition skills.

 

Visual Recognition Use Cases

Take Watson Merge for example. Merge is Watson’s Medical solution. Merge uses Watson’s amazing visual cognitive abilities to enhance the medical service on many levels. Merge has many sub-solutions to organize every aspect of a hospital’s activities but not only that. When IBM bought Merge for a billion dollar in 2015, it had access to more than 30 billion medical images which were fed to Watson to train him (excuse me for using “him”) to identify things like tumors and anomalies and stuff that doctors spend many many hours manually looking for.

Screenshot merge

Source: merge.com

 

You can use Watson to classify vehicle damage from a vehicle’s image. So, Watson can know what’s wrong with your vehicle and send a notification to your maintenance center to be prepared to fix your car.

 

OmniEarth, a startup in Virginia, used Watson Visual Recognition to analyze huge amounts of aerial images of drought-stricken areas. Then Watson can help them decide which areas need to scale down their consumption of water to solve the water problem in areas like California. For example, Watson could identify 150,000 swimming pools in just 12 minutes.

Field

Source: ibm.com

These are just a glimpse of what Watson can do with only one of its abilities, Visual Recognition.

 

Now, let’s try it

Maybe you’ve read my article about Google Vision and AWS Rekognition services. We are going to do pretty much the same with Watson to test its abilities and give a head start to those who want to start developing with it.

First of all, you need to create an account on IBM Cloud. You can do that from here:

https://console.bluemix.net/

 

Now when you login to your account, you can use this link to create a new project that will allow you to use Watson Visual Recognition:

https://console.bluemix.net/developer/watson/create-project?services=watson_vision_combined&hideTours=true

Watson Login Screen

Pick a name for your project.

 

Now to deal with Watson you can do it through Python, Java, Node or through normal web requests. As usual I am going with Python. Again, my environment is Python 2.7 and Ubuntu 16.04.

Now, to use the service through any of these we are going to need an API Key as we did in the previous article.

So, go back to the dashboard home page:

https://console.bluemix.net/dashboard/apps

Then, choose your project. In the project page, you will find “Service Credentials” in the menu on the left. There you can click on “View Credentials” to get your API Key.

Watson Screenshot

Now, we need to prepare our programming environments. So, we start by installing the Python Watson API package:

 

- sudo pip install --upgrade "watson-developer-cloud>=1.0,<2.0"

 

Now in a new python session, start by importing the package we need.

 

In [1]: from watson_developer_cloud import VisualRecognitionV3

 

Now we need to start an instance of the service:

 

In [2]: visual_recognition = VisualRecognitionV3(
   ...:     '2016-05-20',    	# This is the latest version number
   ...:     api_key='HERE GOES YOUR API KEY'
   ...: )

 

Since Watson’s responses are all in JSON format, we need to import JSON to be able to interpret the results in a good way.

 

In [3]: import json

 

Classifying an Image

Now, for the testing, we will use the same image we used with Google and AWS services.

Image room

Now, we need to open the image file and send it to the “classify” method of the service.

 

In [4]: with open('./img.png', 'rb') as images_file:
   ...:     classes = visual_recognition.classify(images_file)
   ...:    

 

After the response comes back, we can use the “json” library to view the result in a pretty way.

 

 

In [5]: print(json.dumps(classes, indent=2))

 

 

We used “indent=2” because you will see that the results JSON response have three levels of hierarchy.

 

Here is the output:

{
  "images": [
    {
      "image": "./img.png", 
      "classifiers": [
        {
          "classes": [
            {
              "score": 0.673, 
              "class": "living room", 
              "type_hierarchy": "/indoors/living room"
            }, 
            {
              "score": 0.742, 
              "class": "indoors"
            }, 
            {
              "score": 0.578, 
              "class": "reception", 
              "type_hierarchy": "/telecommunication/broadcasting/reception"
            }, 
            {
              "score": 0.578, 
              "class": "broadcasting"
            }, 
            {
              "score": 0.578, 
              "class": "telecommunication"
            }, 
            {
              "score": 0.509, 
              "class": "penthouse", 
              "type_hierarchy": "/housing/apartment/penthouse"
            }, 
            {
              "score": 0.511, 
              "class": "apartment"
            }, 
            {
              "score": 0.529, 
              "class": "housing"
            }, 
            {
              "score": 0.503, 
              "class": "furnishing"
            }, 
            {
              "score": 0.5, 
              "class": "parlor", 
              "type_hierarchy": "/indoors/parlor"
            }, 
            {
              "score": 0.927, 
              "class": "beige color"
            }, 
            {
              "score": 0.724, 
              "class": "reddish brown color"
            }
          ], 
          "classifier_id": "default", 
          "name": "default"
        }
      ]
    }
  ], 
  "custom_classes": 0, 
  "images_processed": 1

}

 

 

 

You can see that Watson has the same high level perspective as that of Google Vision. It couldn’t detect details like the TV screen, the clock or the couch, but it could identify that this is a living room in an apartment.

 

Now, we want to test how long did it take Watson to figure this out. So, we will measure the time before and after the execution. Since we are going to run all the codes within the same internet connection speed, this should give us an approximation of Watson’s performance.

 

In [9]: import time 

In [10]: start = time.time()
    ...: with open('./img.png', 'rb') as images_file:
    ...:         classes = visual_recognition.classify(images_file)
    ...: end = time.time()
    ...: 

In [11]: end-start
Out[11]: 6.292158842086792 seconds

 

Note the time here is longer than the ones with AWS or Google because here we don’t have the concept of buckets on the same server to upload the image to. So, these 6 seconds include uploading the image file to Watson and the classification and sending the results back.

 

Detecting Faces and Celebrities

As Google and AWS, Watson can identify faces and more specifically celebrities.

For this, we will use an image containing celebrities from my favorite show:

Image of "Friends"

Here, we use the “detect_faces” method:

In [25]: start = time.time()
    ...: with open('./friends.jpg', 'rb') as images_file:
    ...:         classes = visual_recognition.detect_faces(images_file)
    ...: end = time.time()
    ...: 

In [29]: te
Out[29]: 7.801466941833496 seconds

{
  "images": [
    {
      "image": "/home/ahmedn1/Downloads/friends.jpg", 
      "faces": [
        {
          "gender": {
            "gender": "MALE", 
            "score": 0.99593
          }, 
          "age": {
            "max": 44, 
            "score": 0.506266, 
            "min": 35
          }, 
          "identity": {
            "score": 0.952574, 
            "name": "Matthew Perry", 
            "type_hierarchy": "/people/celebrities/stars/matthew perry"
          }, 
          "face_location": {
            "width": 59, 
            "top": 110, 
            "left": 197, 
            "height": 61
          }
        }, 
        {
          "gender": {
            "gender": "FEMALE", 
            "score": 0.982014
          }, 
          "age": {
            "max": 24, 
            "score": 0.666866, 
            "min": 18
          }, 
          "identity": {
            "score": 0.622459, 
            "name": "Jennifer Aniston", 
            "type_hierarchy": "/people/women/celebrities/jennifer aniston"
          }, 
          "face_location": {
            "width": 65, 
            "top": 36, 
            "left": 252, 
            "height": 88
          }
        }, 
        {
          "gender": {
            "gender": "FEMALE", 
            "score": 0.993307
          }, 
          "age": {
            "max": 34, 
            "score": 0.420308, 
            "min": 25
          }, 
          "identity": {
            "score": 0.982014, 
            "name": "Courteney Cox", 
            "type_hierarchy": "/people/celebrities/courteney cox"
          }, 
          "face_location": {
            "width": 56, 
            "top": 159, 
            "left": 132, 
            "height": 68
          }
        }, 
        {
          "gender": {
            "gender": "MALE", 
            "score": 0.997527
          }, 
          "age": {
            "max": 44, 
            "score": 0.506266, 
            "min": 35
          }, 
          "identity": {
            "score": 0.997527, 
            "name": "Matt LeBlanc", 
            "type_hierarchy": "/people/celebrities/matt leblanc"
          }, 
          "face_location": {
            "width": 68, 
            "top": 157, 
            "left": 389, 
            "height": 65
          }
        }, 
        {
          "gender": {
            "gender": "MALE", 
            "score": 0.952574
          }, 
          "age": {
            "max": 34, 
            "score": 0.385342, 
            "min": 25
          }, 
          "face_location": {
            "width": 70, 
            "top": 69, 
            "left": 312, 
            "height": 77
          }
        }, 
        {
          "gender": {
            "gender": "FEMALE", 
            "score": 0.997527
          }, 
          "age": {
            "max": 44, 
            "score": 0.497324, 
            "min": 35
          }, 
          "identity": {
            "score": 0.982014, 
            "name": "Lisa Kudrow", 
            "type_hierarchy": "/people/celebrities/lisa kudrow"
          }, 
          "face_location": {
            "width": 60, 
            "top": 171, 
            "left": 342, 
            "height": 73
          }
        }
      ]
    }
  ], 
  "images_processed": 1
}

As you see, it could get 5 of them. Sorry, David Schwimmer :)

But David’s face got captured and identified as a male as well. Just couldn’t figure out who he is.

An interesting thing here with Watson is that it also tries to figure out an age range for each face. You can see the age range for each actor is pretty close to the real one.

Just for curiosity, I sent my own face’s image. I uploaded a few photos of my face and got the age estimate for each one and every time I got 18-24 years old. This made me happy I’m not lying because my age is 28 :)

 

Video Classification

In the Google and AWS article, we saw how to process videos and classify them. Watson, doesn’t actually have built-in support for videos, but as you should know, videos are a bunch of images sequenced together. So, we can easily extract all video frames and send them to Watson one by one and get their results back. Actually, we can put each 20 images in one zip file and send it to the “classify” method to get the results for the batch.

 

For this, we are using the OpenCV library. To install it, run this command:

 

- sudo pip install cv2

 

Then, we can do the process:

In [37]: import cv2

In [38]: vidcap = cv2.VideoCapture('/media/ahmedn1/Ahmedn12/Titanic.mp4')

In [39]: success,image = vidcap.read()

In [40]: count = 0

In [41]: success = True

In [42]: frame_counter = 0

In [43]: zip_counter = 0

In [44]: import zipfile

In [45]: fileList = []                                                         
    ...: while success:                                                        
    ...:     success,image = vidcap.read()                                     
    ...:     file_name = "frame%d.jpg" % (frame_counter)                       
    ...:     cv2.imwrite(file_name, image)                                     
    ...:     fileList.append(file_name)                                        
    ...:     frame_counter += 1                                                
    ...:     if frame_counter == 19:                                           
    ...:         zf = zipfile.ZipFile('zip%d.zip'%(zip_counter), 'w')
    ...:         for f in fileList:
    ...:             zf.write(f)
    ...:         zf.close()                                                     
    ...:         with open('zip%d.zip'%(zip_counter), 'rb') as images_file:
    ...:             classes = visual_recognition.classify(images_file)    
    ...:         print(json.dumps(classes, indent=2))                           
    ...:         zip_counter += 1 
    ...:         fileList = []    
    ...:         frame_counter = 0
    ...:         

 

In this code snippet, we collect frames from the video. Each 20 frames are then archived in one zip file and sent to Watson to be classified.

 

Building Custom Classifiers

Now, this is the interesting part. So far, everything we do here can be done in Google Vision or AWS Rekognition. But Watson can help you build your own classifier. Yes, you can train your own classifier for your own custom task using Watson’s resources.

 

Here is how to do it:

First, the preparation:

For each class you want the custom classifier to be able to recognize, you should create a zip file containing images in that class. The number of images for each class should be between 10 and 10,000. The size of the zip file for each class should be less than 100MB. The minimum resolution of each image should be 32x32 pixels.

Then, you create one more zip file for negative images. Images that do not represent in any way any object of the positive classes.

 

Here is the code to build the classifier:

In [49]: with open('./class1_images.zip', 'rb') as class1, open(
    ...:         './class2_images.zip', 'rb') as class2, open(            
    ...:                 './negative_images.zip', 'rb') as negatives:
    ...:     model = visual_recognition.create_classifier(
    ...:         'custom',                                
    ...:         class1_positive_examples=class1,
    ...:         class2_positive_examples=class2,                  
    ...:         negative_examples=negatives) 

 

 

Here you load each zip file and use “create_classifier” method. In “create_classifier”, each positive class’s zip file should be passed to an argument named in this format {class_name}_positive_examples. The negative file is passed to an argument named negative_examples.

 

This will take some time to finish depending on the number of images, number of classes and how complex the classification is.

 

You can actually close the terminal session here and use the “list_classifiers” method frequently to check progress.

In [49]: models = visual_recognition.list_classifiers()
In [50]: print(json.dumps(models, indent=2))

{
    ...:   "classifier_id": "custom_id",
    ...:   "name": "custom",
    ...:   "owner": "owner-id",
    ...:   "status": "training",
    ...:   "created": "creation_date",
    ...:   "classes": [
    ...:     {"class": "class1"},
    ...:     {"class": "class2"}
    ...:   ]
    ...: }

 

You can see from the status that it is still training. You can build different custom classifiers (depnding on your IBM plan). But how would you use your own classifier?

 

First, we should wait until the classifier is ready.

 

In [49]: models = visual_recognition.list_classifiers()
In [50]: print(json.dumps(models, indent=2))

{                                                
    ...:   "classifiers": [                                                            
    ...:     {                               
    ...:       "classes": [                  
    ...:         {                                    
    ...:           "class": "class1"         
    ...:         },                                   
    ...:         {                                             
    ...:           "class": "class2"                           
    ...:         },                          
    ...:       ],                            
    ...:       "created": "creation_date",
    ...:       "classifier_id": "custom_id",    
    ...:       "name": "custom",                        
    ...:       "owner": "owner-id",
    ...:       "status": "ready"                               
    ...:     }                                                 
    ...:   ]                                                   
    ...: }                         

 

 

Now, how to use it?

Remember “classify” method? It doesn’t only take the image/images as a parameter. Check this snippet:

 

 

In [54]: start = time.time()
    ...: with open('./img.png', 'rb') as images_file:                          
    ...:         classes = visual_recognition.classify(images_file, parameters=json.dumps({'classifier_ids':['custom_id']}))
    ...: end = time.time()
    ...: 

 

Here, the second “classify” parameter is “parameters”. This is a json formatted parameter that can have different uses. We use it here to define which classifiers we want to use. Yes, you can use multiple classifiers on the same image. Assume you have a custom classifier that classifies cars, another that classifies food and one more that classifies furniture. You can use all of them on the same image to get all these classes from the image. You can also add the default Watson classifier to get other objects in the image.

All you have to do is to pass an array of classifier ids in the json formatted parameter argument ('classifier_ids':['custom_id']). Now Watson has to built-in classifier ids: ‘default’ for the default classifier we’ve been using so far and ‘explicit’ for a pornographic classifier. So, we can basically do this:

 

In [54]: start = time.time()
    ...: with open('./img.png', 'rb') as images_file:
    ...:         classes = visual_recognition.classify(images_file, parameters=json.dumps({'classifier_ids':['custom_id', 'default', 'explicit']}))
    ...: end = time.time()
    ...: 

 

This way, we can get objects in the image using our own custom classifier, Watson’s default classifier and the pornographic one.

 

You can also retrain a classifier (to add more classes or add more images for some of the classes) using the “update_classifier” method which follows the same format as “create_classifier”.

 

In [55]: with open('./class3.zip', 'rb') as class3, open(      
    ...: './more_class2.zip', 'rb') as more_class2:               
    ...:     updated_model = visual_recognition.update_classifier(
    ...:         classifier_id='custom_id',            
    ...:         class3_positive_examples=class3,      
    ...:         class2_positive_examples=more_class2)

 

Here, we added a new class “class3” and added more images to “class2” to retrain the classifier. You can also add more images to the negative class if you want. There is one more argument here, which is the classifier_id you wish to retrain.

 

You can also delete a classifier by using “delete_classifier” method and passing the classifier_id you wish to delete.

 

So, there we go. We learned how to use Watson’s amazing cognitive skills in the visual part. That’s one more very good option for you to consider when building your application along with Google Vision and AWS Rekognition.

 

Summary

Now, let’s do the comparison again:

 

 

AWS Rekognition

Google Cloud (Vision/Video)

IBM Watson

Cost

Gives you free cost for the first 1,000 minutes of video and 5,000 images per month for the first year

Other than that, Rekognition is relatively cheaper than Google Cloud Vision/Video

The first 1,000 units per month are free (not just for the first year)

* Free:

- Classify 250 images/day

- train ONE custom classifier with up to 5000 images

 

 

Cost Ranking (1 is the cheapest)

1

2

3

Performance

Up to 2 seconds per image and 2 minutes per video

Similar performance (measured by response time to client)

Similar performance (measured by response time to client)

Services

Object detection, face detection and recognition, content moderation, celebrity recognition, activity recognition, person tracking, text recognition

Object detection, face detection but NOT recognition, content moderation, text recognition

Object detection, face detection, content moderation, text recognition, custom classifiers

Diversity of labels

Can detect more variety of little details in the image

Mostly looks at the bigger image

Mostly looks at the bigger image

Code Clarity

Very clear and easy

A little bit ambiguous

Very clear and easy

 

I hope this was useful to you and your projects. If you have any comments or questions, please let me know in the comments section.

Join the community!

Imaginghub: your community ... Show more