ADRENALINE: an OpenVX environment to optimize embedded vision applications on many-core accelerators

ADRENALINE as an OpenVX environment is the perfect solution for the optimization of embedded vision applications on many-core accelerators and here is why!

Without accelerating computer vision algorithms it would be tremendously difficult to ensure support for pervasive applications. In order to tackle this issue, the industrial community has put in efforts towards improving the embedded vision technologies. Ideas for such improvement include incorporating advanced Computer Vision capabilities into a variety of embedded systems. Great examples of this kind of implementation include smart video surveillance and augmented reality (AR).

The improvement of the embedded vision technologies has surely come in handy for many end users. However, it requires a higher level performance and advanced energy efficiency in order to function properly. This is by no means an easy goal to achieve but one of the best solutions for the issue lies in designing systems that feature many core accelerators. The way these systems work is simple.

Numerous accelerators provide hundreds of small processing units which are connected to a shared on-chip memory using high-throughput interconnection. These kinds of systems, referred to as heterogeneous systems, can significantly boost overall performance and increase efficiency, which is exactly what is needed for the implementation of advanced embedded vision technologies. The whole solution may sound simple but it does drag along a lot of complexity in terms of programming.

 

Presenting OpenVX and ADRENALINE

In this article we are going to present the tool used to optimize the embedded vision applications on many-core accelerators, ADRENALINE, which is in a way an OpenVX environment. For starters, OpenVX presents a cross-platform standard for vision application domains and thus is a very significant part of the whole system. Being based on C API, OpenVX is very easy to use thanks to its standard and plain structure that is transparent to all architectural details. OpenVX allows the optimization of low-level image processing as well as its implementation. Below is an example of an OpenVX:

 

1 vx_context ctx = vxCreateContext ( ) ; 
2 3 vx_image rgb = vxCreateImage ( ctx , . . . ) ; 
4 vx_image gray = vxCreateVirtualImage ( . . . ) ; 
5 vx_image gauss = vxCreateVirtualImage ( . . . ) ; 
6 vx_image gradX = vxCreateVirtualImage ( . . . ) ; 
7 vx_image gradY = vxCreateVirtualImage ( . . . ) ; 
8 vx_image mag = vxCreateImage ( ctx , . . . ) ; 
9 vx_image phase = vxCreateImage ( ctx , . . . ) ; 
10 
11 vx_graph graph = vxCreateGraph ( context ) ; 
12 
13 vx_ColorConvertNode ( graph , rbg , gray ) ; 
14 vx_Gaussian3x3Node ( graph , gray , gauss ) , 
15 vx_Sobel3x3Node ( graph , gauss , gradX , gradY ) ; 
16 vx_MagnitudeNode ( graph , gradX , gradY , mag ) ; 
17 vx_PhaseNode ( graph , gradX , gradY , phase ) ; 
18 19 status = vxVerifyGraph ( graph ) ; 
20 if ( s t a t u s != VX_SUCCESS ) abort ( ) ; 
21 
22 while ( / ∗ inputimages ? ∗ / ) { 
23 / ∗ capture data into rgb ∗ / 
24 vxProcessGraph ( graph ) ; 
25 / ∗ use data from out ∗ / 
26 } 
27 
28 vxReleaseContext (& c o n t e x t ) ; 

 

Furthermore, the main point of our focus is ADRENALINE, a framework designed for effective optimization of OpenVX applications. This framework is also highly effective when it comes to fast prototyping on heterogeneous SoCs that feature many-core accelerators. The main component of ADRENALINE is the optimized OpenVX run-time system. This system is based on the streamlined Open CL support targeting a generic heterogeneous SoC template.

The latest OpenVX extension is the OpenVX 1.2, which was released back in May 2017. The extension does not require a high-power CPU/GPU complex so even a low-power host can set it up and manage it. The Toolchain for PULP v3 architecture, the current ADRENALINE SDK version, OpenVX kernels for CMA or OpenVX examples are all available for download, no matter the CPU capacity. Furthermore, you can build OpenVX on any PC using VirtualBox software by following  this detailed tutorial.

 

The Main Objectives and Structure of ADRENALINE

The reason why ADRENALINE is so effective is because it was designed with two clear objectives in mind. Firstly, it was designed for application bench marking and profiling, while its second purpose is architectural tuning. It is important to mention that this system comes with an advanced virtual platform that has an objective of modeling the target architecture template.

The virtual platform is written in C++ and Python and there are valid reasons for the usage of both of these languages. Python is there for configuration purposes and the improvement of execution management, as high-level management is required. On the other hand, C++ is used for ensuring the implementation of the models goes smoothly and effectively.

 

Extending ADRENALINE

One of the great characteristics of ADRENALINE is the fact that it can be extended. One can extend ADRENALINE by writing a new module, which is a Python class responsible for declaring the input and output ports. When connected to other modules, these ports specify the connections between the architectural blocks. Each Python Class needs to have a corresponding class written in the second language, that is, in C++, in order to implement the block model.

Upon starting the platform, the C++ class receives configuration from the corresponding Python class. The configuration includes information such as property values and the way the ports are connected. It is important to emphasize that only the C++ code is running during simulation. Thus, extending ADRENALINE comes down to writing the new modules while the complexity of the process depends on the block.

 

OpenVX Run-Time

We’ve already covered the OpenVX, however it is important to mention that ADRENALINE is based on OpenCL. When it comes to OpenCL, there is one general issue that occurs commonly. The issue is closely related to the use of global memory space for the purpose of sharing intermediate data between kernels. A bottleneck occurs when increasing the number of interacting kernels because the main memory bandwidth is significantly higher than the one that is available.

This forms a limitation that the system needs to overcome. To do so, the semantics of the OpenCL are extended in order to support the explicit memory management. This does not only provide more control but it also allows a more efficient reuse of the on-chip memory, which makes it quite a significant feature.

Overall, ADRENALINE can serve quite effectively in improving optimization opportunities when it comes to embedded vision applications on many-core accelerators. The great thing about it is that it can be used by a wide range of end users which even includes hardware designers. For more experimental results, head over to this link!

If you have any experiences with OpenVX and ADRENALINE or you want to learn more about it, feel free to share your thoughts in the comments (you need to login or register to comment).

 

Sources

  • Adrenaline - about the project, code download and experimental results. (not available anymore) previously available at http://projects.eees.dei.unibo.it/adrenaline/; page visited on July 6th 2018

  • Setup tutorial. https://github.com/rgiduthuri/openvx_tutorial

Comments

Want to read on?

Find other interesting ... Show more