Office kitchens are not always the nicest places. It only takes one person to leave a cup in the sink and then civilisation as we know it teeters on the edge of destruction. This solution uses computer vision to identify objects and likely culprits. It does not measure out judgement on the guilty (for now)
This project is broken down into a pair of streams:
These two tasks are similar, but different.
Finding a mess
Mess detection uses YOLO (you only look once) developed by Joseph Redmon and Ali Farhadi. These folks are far smarter than I will ever be and I not only acknowledge their work, but I want to thank them for making it public.
In short, we take an image every 30 seconds from a Basler Dart Camera (supplied as part of the evaluation kit), and passes this through YOLO using the standard training model. If there is a probablility greater than 80% that a domestic item (cup, plate, glass, etc) is in the field of view, then we have a "mess" event.
pypylon handles the control of the camera, then passes off a standard image to OpenCV and YOLO for processing.
Because YOLO contains dozens of objects which could find their way into the sink, it's not practical to break it down with a switch statements. In order to lighten the load on the Up Board both in performance and storage, I offloaded this task to an Amazon Lambda function, via the API Gateway.
The use of an external function also allows the camera platform to be used for general identification of other types of objects as well. No code changes would be required on the main routines.
Finding the culprit
A second process is running on the Up-Board, which uses an additional camera to look at the area surrounding the kitchen. In this case, we are using OpenCV to identify faces of people and note the time they were in the area.
When a mess-event occurs, we look up to see who may have been in the area at the time. An external script could be called to name and shame the likely offenders.
Building
You will need either the Up Embedded Vision Starter Kit, or the components which are available separately.
First get your board up and running. I used a standard Ubuntu Desktop 16.04.10 LTS, and installed manually. If you remove the image display code, the application could also be run on Ubuntu Server. Make sure that you follow Up's documentation on getting the device-specific kernel installed.
All my code is based around Python 3.6. I have not tested it against Python 2, but there should be few modifications required.
I also added the Intel OpenVINO toolkit, because OpenCV is optimised there. The installation instructions are located at https://software.intel.com/en-us/articles/OpenVINO-Install-Linux . Note that registration is required to download.
The Basler Dart camera included with the kit does not integrate as a standard V4L or Webcam device, you need two more components:
Basler Pylon Suite - the software which drives the Basler cameras
Basler PyPylon - a Python interface into the Pylon software
By the way, there is no need to build pypylon for yourself, just use the release to match your operating system.
Finally, install numpy for Python3
You will need an Amazon Web Services (AWS) account for lambda, api and dynamodb functions. All of these services qualify for the free tier if you want to experiment. You will need to add you API endpoint into messfinder.py
When all this is done, you need to aim your camera at the sink and focus it carefully. This is the only real criticism I have of the Basler Dart camera. It is not easy to mount. If I had access to a 3D printer or a CNC milling machine I am confident I could create an elegant mounting system, but time was against me. I used a length of acrylic rod and a hot glue gun. Not pretty, but it got the job done.
Before running messfinder.py, load the Basler-recommended kernel settings by executing "tools/optimal-settings.sh"
The face detection routine needs to be trained. in the directories beneath "training-data", place photos of each face to be recognised (one directory per person) and run recogniser.py. This will create a series of "dat" files which are used by searcher.py to identify the face in an image.
Training can take quite some time, and may be better run on a more powerful machine. Then the dat files can be copied onto the UP Board for use. This improves the time it takes to get ready.
FaceFinder is copied from my FaceFinder project on bitbucket (https://bitbucket.org/chrismor/facefinder) . The full version can be found there, and may be more up-to-date.
Outstanding Issues
The biggest problem is that the system is still too slow. YOLO is brilliant, but the CPU on the UP Board is working at 100% on all cores, and all available memory is used up, so perhaps the 4GB model might be a better plan for continual observation. Although it is too late for this contest entry, I have started experimenting with using a Movidius Neural Compute Stick, and the results are looking promising. I will probably create another post in the future on the results of my work in this area.
@article{yolov3, title={YOLOv3: An Incremental Improvement}, author={Redmon, Joseph and Farhadi, Ali}, journal = {arXiv}, year={2018} }
Title | Description | Format |
---|---|---|
Detection Screen during development | Dirty sink for illustration purposes only :-) | jpg |
All the YOLO models and configuration | zip |
Name | Article number | Link | Quantity | Unit Price |
---|---|---|---|---|
UP Embedded Vision Started Kit | RE-UP-PACK-VISION-002 | up-shop.org | 1 | 349.00 $ |
Total 349.00 $ |
You'd like to participate ...
Show more