My MS HoloLens Experience: Unboxing & why does the HoloLens need six cameras?

A story of Simultaneous Localization and Mapping in consumer products...

The HoloLens finally arrived here the other week and we unboxed it at Imaginghub R&D. Continue reading to get a general overview on what comes with the HoloLens and a short review. We’ll get to the main question later… Curious? Skip unboxing part.

 

Let’s have a look at what is in the smart box:

  • the HoloLens
  • a clicker
  • a nice carrying case
  • a microfiber cloth
  • a charger and a micro USB 2.0 cable.

 Hololens Box open    Hololens case open

 

To be closer to the action, just watch our unboxing video.

 

Setup and what we have done so far?

The HoloLens was actually an absolute attraction in the office. Everybody passed by to test it.

We tested the standard apps and really got into moving the “holograms” around. We also went on the HoloTour and played some games like the Land of Dinosaurs and RoboRaid. The applications feel totally real. The only negative point we recognized about the HoloLens is the limited field of view. The apps we tested were mostly demos. For serious business purposes there is still room for improvement. Skype is a good start.

 

The price...

The price of the HoloLens dev kit is justified: It is really not a toy, but an augmented reality (AR) headset containing a “grown up” computer with high performance, similar to a laptop, comprising a 60 GB solid state disk, 2 GB of working space, 2 GPUs, 1 CPU and all together 6 cameras!

 

 

So... What are the six cameras for?

The HoloLens from Microsoft is the example of augmented reality (AR). Basically it is a headset which provides a superimposed reality on the real reality. Put it on your head and you can see things others can’t! The simpler version of this was the Pokémon Go game (https://www.theguardian.com/technology/2016/oct/23/augmented-reality-development-future-smartphone).

 

SLAM

Now the basis of the HoloLens magic is being able to “know” where it is in a room to a high accuracy (in millimeters) and to project its holograms on the room in the right position, so they look like real objects stuck to other objects (like tables and walls). To “know” where it is in the room it has to simultaneously locate itself and map the room as it goes through it. This technique is known as SLAM (simultaneous localization and mapping). Microsoft has released this video describing it in very simple terms: https://www.youtube.com/watch?v=TneGSeqVAXQ

This “SLAM” is achieved using video sensors and fits to the topics on the “hub” which is why I thought I write a bit about it in detail. In order to achieve it there are five special cameras in the HoloLens plus one extra which is for the user to take pictures/videos with. So let’s go through the other five now.

Do you have any exciting ideas what to do with the HoloLens? Then log in or register in the Imaginghub and let us know!

Now clearly if you’re Microsoft it’s no problem to throw in an Intel Cherry Trail (14nm) processor and couple GPUs, 2GB RAM and 64 GB flash (and wifi, stereo speakers, bluetooth, etc…). Where things become interesting is when Microsoft designs its own silicon, the Holographic Processing Unit (HPU). With it 65 million logic gates (28nm manufacturing process) it contains the algorithmic to perform the SLAM. The GPUs are presumably used only to produce the 3D graphics that you see superimposed onto the real world through the two LCoS (liquid crystal on silicon - the stuff in projectors).

The SLAM hardware and obviously the cameras are built onto one sensor bar (a kind of PCB thing). There are four “environment sensing cameras”, a time of flight (ToF) camera and a 2MP camera for user pics. Also on the bar is an inertial measurement unit (IMU).

 

"Environment sensing cameras"

Normal stereoscopic sight requires two cameras pointing forward and the mathematics has been around for some time. By matching objects in the two different cameras and measuring the position to the cameras you can work out the depth information very accurately. These are two of the four “environment sensing cameras”. The two extra ones point outwards and give the whole system a peripheral vision. It gives the stereoscopic cameras a head chance of preparing themselves before anything appears in front of them.

Also the whole system has to recognize “things” like walls, floors, tables and people. It does this using a simple neural network. Clearly when mapping the room it has to know what is relevant (walls, floors, tables) and what is temporary (human beings). This also allows it to guess the orientation of a browser window the user wants: On a wall it should be stuck to the wall, on the table it should stand up like a monitor. The peripheral vision camera is also useful here.

 

ToF camera

The ToF camera is for short distances and it provides direct 3D information for two main uses. Firstly to recognize hand gestures for controlling the system. Secondly it provides distance information when no features are available to properly use stereoscopic matching, for instance when you stare at a blank wall. Normally this is not an interesting use case, except when the AR has projected a hologram or browser window onto this wall, then it becomes very relevant. So it also adds its info to the overall SLAM system.

 

HMU

Now how all of this sensor information is fused together in the mathematics of the HPU is, of course, Microsoft’s secret. But what is interesting is that the mapping information can be stored and recalled for future use (interestingly enough categorized by Wifi SSID). Not only that the stored information of many HoloLenses can be combined in the cloud and used to provide accurate layouts of the inside of buildings and structures which GPS can’t get to. Although you can do this with laser triangulation systems they rely on a perfectly empty structure. People moving around in it while they are mapping destroy the quality of the information. Also occlusion is a major problem which doesn’t affect the HoloLens system (assuming people look behind things as well).

 

Review

The complete system is very impressive in use and very robust. Feedback from around the office is all good (apart from the known restrictive Field of View). Of course you can build augmented reality yourself with two cameras and a rift (http://hackaday.com/2013/12/09/oculus-rift-goes-from-virtual-to-augmented-reality/). But then you’re not SLAMming.

 

Do you have any exciting ideas what to do with the HoloLens? Then log in or register on the Imaginghub and let us know!

Comments

Join the community!

Imaginghub: your community ... Show more