Fresh from success with YOLOv3 on the desktop, a question came up of whether this could be made to work on the Movidius Neural Compute Stick and therefore run on the Raspberry Pi.
The NCS is a neat little device and because it connects via USB, it is easy to develop on a desktop and then transfer everything needed to the Pi.
The app zoo, on the ncsdk2 branch, has a tiny_yolo_v2 implementation that I used as the basis for this. It only took about an hour to get this working on the desktop – integration with rt-ai was very easy. The Raspberry Pi end was not – all kinds of version number issues and things like that. However, even though not all of the tools would compile, I just moved the compiled graph from the desktop to the Pi and that worked fine.
This is the design. The main difference here from the usual test designs is that the MYOLO SPE is assigned to node pi34 (the Raspberry Pi) rather than the desktop (Default). Just assigning the MYOLO SPE to the Pi saved me from having to connect a Picam or uvc camera to the Pi and also allowed me to get a better feel for the pure performance of the Pi with the NCS.
As can be seen from the first screen capture it worked fine although, because it supports only a subset (20 of 91) of the usual COCO labels, it did not pick up the mouse or the keyboard. Performance-wise, it was running at about 1fps and 30% CPU. Just for reference, I was getting about 8fps on the i7 desktop.
I had intended to be doing something completely different today (working on auto-compiling highlight reels of interesting events generated from the prototype production rt-ai Edge object detection system) but managed to get sidetracked by reading about Darknet-based YOLOv3. As Darknet itself is in C and compiles to a shared library this was a good candidate for a Dockerized stream processing element. I used a cuDNN image from NVIDIA as the base since it provides pretty much everything required – I just had to add in the rt-ai SPE library software and compile Darknet on top of that.
The results are pretty good. The preview above shows some detected objects. I discovered that it could detect toothbrushes which is why I am waving one around. It also did a good job of picking up the second mouse just by my left shoulder. 2fps with 1280 x 720 frame size is a little disappointing but this seems to be due to the Python parts of the code since the C demo provided with the library runs much faster. It is a little faster with preview turned off, however (which would be the production mode anyway).
Speaking of production, it does have a problem as it consumes just over 7GB of memory on my GTX 1080 ti GPU card. This means that one GPU card can’t run two instances simultaneously, unlike with the TensorFlow SSD detector. In fact, I can get two instances of that working on a GTX 1080 card with 8GB total memory.
Just for completeness, this is the design which looks just like the usual test designs. The Docker container is built and pushed to a private Docker registry automatically when the design is generated. The target node then just pulls the image from the registry when the design starts up.
This is the MediaView output showing the metadata. The metadata format is equivalent to that generated by the TensorFlow object detector so that they are completely interchangeable.
Docker containers are a great way of reducing the headaches caused by pre-requisites and software versions when deploying code in general and rt-ai SPEs in particular. So it made sense to add support for SPEs in Docker containers in addition to the existing bare metal SPEs. The screen capture above shows the test design in rtaiDesigner using the Docker containerized version of the existing TensorFlow object detector. It is essentially identical to the bare metal version, just with the object detection SPE replaced with the Dockerized version. The container was based on the TensorFlow GPU image.
SPE code is deployed to nodes as a package that includes start and stop scripts. Normally, the start script is something very simple: a single line kicking off a Python script for example. Docker SPEs use a slightly more complex start script that first tries to pull the required Docker image from a defined registry location and then invokes the container in the required manner (using nvidia-docker if necessary).
No changes were required to the SPE code itself in this case – just customization of the start and stop scripts and I added some files used to build the container and install it in the local registry so that the build and update process is very straightforward. Plus, as this test design shows, bare metal and containerized SPEs can be mixed without limitation as the stream interfaces are identical in all cases.
I decided that it would be fun to try out a Google AIY Vision Kit as a sort of warm-up for the potentially much more significant Edge TPU.
The Vision Kit is basically the same configuration as the ZeroSensor camera except with an extra board in the camera path that can perform inference on the captured images. The kit comes with some frozen graphs that can be used to detect a few things but I thought it would be interesting to try training a MobileNet SSD network with the Pascal VOC 2012 training data which can identify 20 different objects. The instructions for how to do this are here.
Once that was all running, the next step was to integrate it with rt-ai Edge. It’s pretty similar to the earlier full-blown TensorFlow version so it didn’t take too long to get working.
The design is much the same as usual except with the new VisionKit object detection SPE instead of TFObjectDetect or Deeplab. Note that the PiCam and VisionKit SPEs are running on the AIY Vision Kit, whereas the MediaView SPE is running on a desktop.
This is the output from the MediaView SPE. The metadata has been formatted to look exactly the same as the previous TensorFlow detector so that they can be used interchangeably in stream processing networks. I am getting about 2 fps with 640 x 360 images which is actually better than I expected.
In order to avoid the whole garbage in, garbage out problem, I want to make sure that the long term data that I am collecting from the ZeroSensors and IP cameras is actually what I think it is. rtaiView is an rt-ai Edge app that allows both real-time streams and historic stored data to be reviewed in a convenient way. It can be used to check both raw data and extracted metadata for each stream. The screen capture above shows an rtaiView instance monitoring the real-time streams from two external IP cameras and the three streams (video, audio and multi-channel sensor) from a ZeroSensor. I am going to add a second ZeroSensor in a second internal location and then let the whole thing run for a while. Just to make life complicated I am using the straight TensorFlow object detector for the external cameras and DeepLabv3 segmentation in color map mode for the internal cameras so that I don’t upset the other inhabitants who aren’t keen on me putting cameras everywhere.
Focus now is moving on to extracting interesting sequences from the stored data and then using these to train an anomaly detector. The exact architecture of the anomaly detector is still a bit unclear – definitely a research project.
Finally this is a ZeroSensor all ready to go into full time service, capturing video, audio and environmental data. The goal is to use this data, and that from other cameras around the space, as training data for machine learning systems.
One specific goal is to create an anomaly detector with minimal supervision. As much as possible, it will learn from experience. This is kind of tricky as it requires detection of unknown length sequences depending on the circumstances. I am intrigued by the ideas behind the Universal Translator but not sure how much could carry over to this application. This paper reviews some of the techniques usually applied, at least for video processing. The situation here is a little different as there are quite different types of features involved. My plan is to preprocess video and audio to recognize salient features (using object detection or whatever) and then input these features, along with environmental sensor data, in the form of uniform time-slotted data sets to the anomaly detector. This doesn’t help with detecting the length of an interesting sequence – that’s the fun part of the project.
Something conspicuously missing from the original ZeroSensor concept was audio support. That’s now been remedied with the SPH0645 board from Adafruit. I was originally a bit put off by the somewhat extended software installation but, in the end, this is the best approach.
This is the new ZeroSynth synth module for the ZeroSensor. Basically it just consists of an audio capture stream processing element added to the existing video and sensor capture SPEs.
The capture above shows the simple test design with the ZeroSynth module and which once again uses the DeepLabV3 SPE along with three PutManifold blocks to map the rt-ai Edge data into the Manifold for storage and offline processing.
rtaiView can be used to view and review the three streams captured from the ZeroSensor. The audio can also be played out of the rtaiView host’s speakers if desired.
The intention is not so much to process audio for content like words at this stage. It’s more the presence or absence of sounds or transients in conjunction with other sensed events that could be more useful. There seem to be two basic choices: either use just an average amplitude level during a specific timeslot or actually try to recognize sound sources during a timeslot. Either of these then becomes a feature for input into an inference engine along with other detected features from the video and sensor streams.