Comparison of hardware for training neural networks

Scientists are using neural networks to train machines to be intelligent. Neural networks can be implemented through software or hardware. In this article we will discuss about available hardware for neural networks.

We humans can never hope to compete with a computer in terms of speed and accuracy. However, one thing that makes us superior to these star performers is our intelligence. So scientists are now trying to train the computers to be intelligent. Let us see how.


What are Neural Networks

To make the machines intelligent, scientists are trying to emulate the design and functioning of human brain and nervous system through neural networks. Like human nervous systems, neural networks have nodes that connect with each other, and transmit and receive signals to and from other nodes. Thus far the emulation is correct, however it is not known accurately which neurons connect with which other neuron(s). So implementing neural connections for correct output is the biggest challenge of neural networks.

Neural networks have found use in deep learning, artificial intelligence, handwriting recognition, machine learning, data mining, etc. However, the single biggest use of neural networks is in training machines for image recognition. We humans can identify objects easily, but machines have to be trained to identify the same objects.

Basic Types of Neural Networks

Basically there are two types of neural networks – feed forward networks and recursive networks. In feed forward networks, connections between the neurons are one way. The inputs are combined to produce desired outputs. However, in recursive networks the errors produced in the output is fed back to the neurons. This increases their accuracy levels greatly but puts pressure on computational abilities of the neural networks as well.

Neural Network Topologies

Every computer network has a topology – the way its nodes are organized – as per its usage. Neural networks also have many types of network topologies. Fjodor Van Veen of Asimov Institute created this chart of almost all the popular neural charts. This chart has been taken from this article on neural networks.

Image: overview of different neural networks

Hardware for Neural Network Implementation

Concept of neural networks can be implemented using software simulation as well as hardware solution. In reality it has been found that software simulation of neural networks in conventional computers is good only for small networks. They are very effective for developing, testing and debugging algorithms. However, if the network is to be scaled up, only software simulation is not sufficient. A hardware solution is essential to decrease time required in learning because learning times increase exponentially with increase in network size.

The hardware for neural network implementation must possess these qualities:

  • High computational power
  • Ability to implement complex algorithms
  • Highly optimized for performance
  • Energy efficient
  • Application scalability

Most Popular Hardware Available

Currently, Intel, NVIDIA and AMD are the only companies offering viable hardware options for large scale neural networks. Let us discuss them here.

Intel Xeon CPUs

CPUs or Central Processing Units are traditionally optimized for single-thread performance. But in deep learning or neural network systems, processors need to have hundreds and thousands of concurrent threads. So intuitively CPUs do not seem correct for neural network systems. However, Intel changed that with the launch of its Xeon Phi series of processors, which was specifically designed to support deep learning computations.

The latest in the series is the Intel Xeon Scalable processor launched in July 2017, which is capable of accurately training ResNet-50 in 31 minutes and AlexNet in 11 minutes.

These are some of the features of these Xeon Scalable processors:

  • 28 physical cores, i.e. 56 threads per socket
  • Up to 8 sockets
  • New Ultra Path Interconnect (UPI) for inter socket communication
  • 2.50 GHz base frequency and 3.80 GHz turbo frequency
  • 1 TB of 2,666 DDR4 memory
  • 6 memory channels
  • 38.5 MB cache
  • 512-bit wide Fused Multiply Add (FMA) instructions

Xeon Scalable processors also boast of software optimizations in Intel Math Kernel Library for Deep Neural Network (Intel MKL-DNN) for most of the popular deep learning frameworks like TensorFlow, OpenGL, Caffe, MXNet, etc.

Overall, these processors have sufficient computational powers and in-built software optimizations to support deep learning training, inference, machine learning and other AI algorithms. A single CPU is enough to provide state-of-art deep learning training performances. If many nodes are used, with correct configuration they can further reduce compute times significantly.

NVIDIA GPU Architectures

GPUs or Graphical Processing Units are built for concurrent multiple-thread processes. So they were built with gaming and other graphics intensive processing tasks in mind. However, soon after the first GPUs hit the market, it was realized that they were excellent for scientific calculations. That was because graphical calculations are nothing but manipulation of lots of matrices.

When machine learning and artificial learning (AI) began getting popular, NVIDIA was the first company to start building GPUs specifically for these applications. That head start has enabled it to produce the best GPUs for neural network, ML, and other AI applications.

The latest GPU launched by the company has an architecture specially designed for computers that excel in simulating the real world. Called Pascal, the architecture has mind-boggling compute capabilities. NVIDIA claims that its Pascal GP100 GPU is the world’s faster processor.

These are some of the most important characteristics of Pascal architecture:

  • 16nm FinFET chip
  • 150 billion transistors
  • 5 Teraflops of double precision performance for HPC workloads
  • 7 times faster deep learning inference throughput
  • High-speed bi-directional inter-socket connection
  • 3 times boost in memory bandwidth performance
  • 16-bit floating point instructions
  • 47 tera-operations per second
  • 8-bit integer instructions
  • Real time responsiveness for deep learning inference

NVIDIA’s legacy architectures Maxwell and Kepler are also high performance GPUs capable of powering complex deep learning systems and huge data centers.

AMD Vega Architecture

The latest architecture released by AMD for its GPUs is called Vega.  It can be an excellent fit for deep learning training and inference workloads due to its various features:

  • 2x performance FP16 as compared to FP32
  • 24 teraflops on FP16
  • 48 trillion operations per second on INT8
  • Designed to work well with lower precision arithmetic
  • 512 GB memory can be addressed
  • Works well with AMD ThreadRipper and EPYC microprocessors

Memory Challenges in Using GPUs

Initially, computer memory chips were capable of serial processing, because that is what they were required to do. As the need for high density memory came up, DRAMs were developed. Every computer has a combination of these two, with DRAMS being used for primary memory and serial memory for secondary memory. These two types of memory must interface with each other efficiently for high system performance.

In case of neural networks, this interface causes bandwidth limitations and high power consumption because huge amounts of weights and activations must be saved for training deep neural networks. The wide vector architecture used by GPUs to execute high-performance calculations increases the number of activations required, which in turn increase local storage requirement. GPUs are also inefficient as compared to CPUs when executing small convolutions that are an integral part of deep neural networks. This is the reason why many big companies continue to use CPUs for their neural network training needs.

NVIDIA introduced the concept of Unified Memory in 2014 with its Kepler architecture, which simplified memory management by allowing single pointer for both GPU kernels as well as CPU functions. Pascal architecture has further improved upon it by adding these two features:

  • 49-bit virtual addressing
  • On-demand migration of pages

Selecting the right hardware

Selection of neural network hardware depends on many factors. Some of the points to be considered include whether the network is going to be memory bound or computation bound. Topology and architecture of the network is an important deciding factor while choosing hardware for neural networks.

Whatever the size of the network, some testing must be performed to understand real time needs of the neural network hardware implementation.


Do you have any comments to the blog article? Just log in or register and leave a comment here.

Join the community!

Imaginghub: your community ... Show more