Object Recognition in Augmented Reality using ArUco Framework

Challenges of augmented reality object recognition lie in detecting correctly despite lighting, camera angles, superimposition of objects, etc. ArUco framework aims at object recognition in augmented reality with minimum error.

The first step in creating augmented reality applications is correct object recognition in real time environment. Object recognition itself involves multiple steps like object detection, processing, and finally identification.

Object detection theories

There are many theories for object detection, which try to emulate the way humans detect objects and identify them. Let us look at some of the most prevalent object detection theories here.

Template theory

In this theory of object recognition, a library of standardized object images is already available. When any object is detected, it is matched against the images in the library to identify the object. This is a highly reliable method of object recognition for standardized images, like alphabets, numbers, industrial objects, etc. However, slight changes in shapes, depths, orientation, will not match the detected object with the template object.

So, if your handwriting is very different from your friend’s or you have a slightly different way of writing alphabets, what you write might not be detected using template theory.

As shown in the image here, you can recognize that both are capital A, but a camera or app using template theory of object recognition will not be able to identify both as A.

Character A

Feature detection and analysis theory

In Feature Detection and Analysis theory, object features like edges, corners, ridges, gradient, blobs, etc. are detected and compared with features of available images. It is a very low-level image processing theory and usually implemented in conjunction with other theories.

After features of an object have been detected, they are processed to come up with a possible image of the object.

Recognition by components theory

According to the recognition by components (RBC) theory, all objects can be created by combining up to 36 shapes called geons. So, when detecting an object, RBC suggests emphasis on two components – edges and concavities. Concavities are the area where two or more edges meet. Due to these two factors, RBC enables us to detect objects irrespective of the viewing angle. This is a huge advantage over other theories, where extra processing must be done to detect the same object when viewed from different angles.

However, RBC has difficulty in recognizing similar objects like, say, apples and pears, because their geons would be the same.

These theories are implemented as algorithms for object detection in augmented reality. It must be noted that no theory is ever used in isolation. Two or more theories are combined to form the basis of creating object detection algorithm.

Whatever the object detection theory or algorithm used, a training database of objects is required to compare the detected objects so that they can be given a name and further processing may be done. Many models and frameworks are used to train the database as well as detect objects.

Two main problems in object recognition in augmented reality are:

  • Pose estimation – Estimating the spatial position of the object with respect to camera
  • Occlusion – Blocking of one object by another

ArUco framework, developed by Rafael Munoz and Sergio Garrido, aims to tackle these problems specifically for augmented reality.

ArUco

ArUco is a minimal library based solely on OpenCV, for creating Augmented Reality applications. However, it gained popularity as a tool for calibrating cameras. As you know, object detection technique may be marker-based or marker-less. ArUco uses a marker-based technique for enhancing pose estimation. It has square shaped black and white markers having their own codes. These square fiducial markers help in handling occlusion problem to a great extent.

The ArUco functions are found in #include<opencv2/aruco.hpp> file.

Features of ArUco

Here are some of the most important features of ArUco, which make it such a popular library for augmented reality applications:

  • Markers can be detected with single line of C++ code
  • It can detect various dictionaries like ARUCO, ArToolkit+, ARTAG, etc.
  • Being minimal, it is faster than most other libraries
  • As it relies only on OpenCV, it is reliable, fast, and cross-platform
  • Cameras can be easily calibrated using ArUco board
  • It comes with BSD license
  • It has many markers for accurate and faster detection
  • Integrated easily with OpenGL (Open Graphics Library) and OGRE (Object-oriented Graphics Rendering Engine)

The latest version of ArUco is 3.x.0. It can detect objects faster, supports automatic discovery of dictionaries, and assists calibration using ArUco Boards.

The dictionary used by ArUco is ARUCO_MIP_36h12, which has 250 different marker patterns.

Image patterns 01

Using ArUco dictionary

As discussed, it is advisable to use the suggested dictionary. Let us see why. Here is a simple C++ program that uses OpenCV to detect an image using ArUco:

#include "aruco.h"
#include <iostream>
#include <opencv2/highgui/highgui.hpp>
  int main(int argc,char **argv){
    if (argc != 2 ){ std::cerr<<"Usage: inimage"<<std::endl;return -1;}
    cv::Mat image=cv::imread(argv[1]);
    aruco::MarkerDetector MDetector; //class for detecting the markers
   //detect
    std::vector<aruco::Marker> markers=MDetector.detect(image);
    //print info to console
    for(size_t i=0;i<markers.size();i++)
        std::cout<<markers[i]<<std::endl;
    //draw the image
    for(size_t i=0;i<markers.size();i++)
        markers[i].draw(image);
    cv::imshow("image",image);
    cv::waitKey(0);
}

 

As you can see here, the markers will be detected (Line 9) and then compared to all the available dictionaries. This is an inefficient way to do things. So, it is advisable to use the ArUco dictionary, as shown below:

 

#include "aruco.h"
#include <iostream>
#include <opencv2/highgui/highgui.hpp>
  int main(int argc,char **argv){
    if (argc != 2 ){ std::cerr<<"Usage: inimage"<<std::endl;return -1;}
    cv::Mat image=cv::imread(argv[1]);
    aruco::MarkerDetector MDetector; //class for detecting the markers
    MDetector.setDictionary("ARUCO_MIP_36h12"); //Using dictionary ARUCO_MIP_36h12
    //detect
    std::vector<aruco::Marker> markers=MDetector.detect(image);
    //print info to console
    for(size_t i=0;i<markers.size();i++)
        std::cout<<markers[i]<<std::endl;
    //draw the image
    for(size_t i=0;i<markers.size();i++)
        markers[i].draw(image);
    cv::imshow("image",image);
    cv::waitKey(0);
}

Here the dictionary to be used has been set in Line 8, before the detection process begins.

 

Using ArUco calibration board

As mentioned previously, ArUco has gained popularity because it can be used to calibrate camera easily using ArUco has a calibration board.

Aruco patterns 2

How ArUco supports augmented reality applications

The first task in an augmented reality application is detecting the objects correctly. The square black and white markers used by ArUco can be easily detected. As the camera is calibrated, the object detection process takes even lesser time.

ArUco framework supports three different modes in DetectionMode class to make the processing even faster, depending upon user requirements. These three cases are:

  • Normal mode – This mode is used when computing time is not much important. A good example is batch processing of images.
  • Fast mode – As the name suggests, detection and speed of processing the detected objects are crucial. Mobile apps are a good example of this case, where the user needs to be provided augmented information quickly and accurately.
  • Video fast mode – In the latest version of ArUco framework, this mode takes care of video sequence processing in a fast and reliable way. Again, mobile apps need this feature the most.

ArUco may not be the perfect approach to object recognition in augmented reality but currently it provides one of the fastest and most reliable methods, which is easy to use and implement too.

 

Do you have any comments to the blog article? Just log in or register and leave a comment here.

Join the community!

Imaginghub: your community ... Show more