Object detection on the Raspberry Pi 4 with the Neural Compute Stick 2


Following on from the Coral USB experiment, the next step was to try it out with the NCS 2. Installation of OpenVINO on Raspbian Buster was straightforward. The rt-ai design was basically the same as for the Coral USB experiment but with the CoralSSD SPE replaced with the OpenVINO equivalent called CSSDPi. Both SPEs run ssd_mobilenet_v2_coco object detection.

Performance was pretty good – 17fps with 1280 x 720 frames. This is a little better than the Coral USB accelerator attained but then again the OpenVINO SPE is a C++ SPE while the Coral USB SPE is a Python SPE and image preparation and post processing takes its toll on performance. One day I am really going to use the C++ API to produce a new Coral USB SPE so that the two are on a level playing field. The raw inference time on the Coral USB accelerator is about 40mS or so meaning that there is plenty of opportunity for higher throughputs.

Object detection on the Raspberry Pi 4 with the Coral USB accelerator


SSD object detection with the Coral USB accelerator had been running on a Raspberry Pi 3 but the performance was disappointing and I was curious to see what would happen on the Raspberry Pi 4.


This is the test rt-ai design. The UVCCam and MediaView SPEs are running on an Ubuntu desktop, the CoralSSD SPE is running on the Raspberry Pi 4. It is getting a respectable 12fps with 1280 x 720 frames (an earlier version of this post had reported much worse performance but that was due to some silly image loading code). The utilization of one CPU core is around 93% which is fair enough for a Python SPE. I am sure that a C++ version of this SPE would be considerably faster again.

Getting this running at all was interesting as the Pi 4 requires Raspbian Buster and that comes with Python 3.7 which is not supported by the edgetpu_api toolkit at this point in time.

After writing the original blog post I discovered that in fact it is trivial to convert the edgetpu_api installation to work with Python 3.7. Without doing any virtualenv and Python 3.5 stuff, just run install.sh (modified as described below to recognize the Pi 4 and fix the sudo bug) and enter these commands:

cd /usr/local/lib/python3.7/dist-packages/edgetpu/swig
sudo cp _edgetpu_cpp_wrapper.cpython-35m-arm-linux-gnueabihf.so _edgetpu_cpp_wrapper.cpython-37m-arm-linux-gnueabihf.so

Turns out all it needed was a correctly named .so file to match the Python version. Anyway, if you want to go the Python 3.5 route…

The ARM version of the Python library is only compiled for Python 3.5. So, Python 3.5 needs to be installed alongside Python 3.7. To do this, download the GZipped source from here and expand and build with:

tar xzf Python-3.5.7.tgz
cd Python-3.5.7
sudo apt-get install libssl-dev
./configure --enable-optimizations
sudo make -j4 altinstall
virtualenv --python=python3.5 venv
source venv/bin/activate

The result of all of this should be Python 3.5 available in a virtual environment. Any specific packages that need to be installed should be installed using pip3.5 as required. Regarding numpy, I found that the install didn’t work for some reason (there were missing dependencies when imported) and I had to use this command (as described here):

pip3.5 install numpy --upgrade --no-binary :all:

Now it is time to install the edgetpu_api which is basically a case of following the instructions here. However, install.sh has a small bug and also will not recognize the Pi 4.

Modify install.sh to recognize the Pi 4 by adding this after line 59:

  elif [[ "${MODEL}" == "Raspberry Pi 4 Model B Rev"* ]]; then
    info "Recognized as Raspberry Pi 4 B."
    LIBEDGETPU_SUFFIX=arm32
    HOST_GNU_TYPE=arm-linux-gnueabihf

Once that is added, go to line 128 and replace it with:

sudo udevadm control --reload-rules && sudo udevadm trigger

The original is missing the second sudo. Once that is done, the Coral USB accelerator should be able to run the bird classifier example.

MobileNet SSD object detection using the Intel Neural Compute Stick 2 and a Raspberry Pi

I had successfully run ssd_mobilenet_v2_coco object detection using an Intel NCS2 running on an Ubuntu PC in the past but had not tried this using a Raspberry Pi running Raspbian as it was not supported at that time (if I remember correctly). Now, OpenVINO does run on Raspbian so I thought it would be fun to get this working on the Pi. The main task consisted of getting the CSSD rt-ai Stream Processing Element (SPE) compiling and running using Raspbian and its version of OpenVINO rather then the usual x86 64 Ubuntu system.

Compiled rt-ai SPEs use Qt so it was a case of putting together a different .pro qmake file to reflect the particular requirements of the Raspbian environment. Once I had sorted out the slight link command changes, the SPE crashed as soon as it tried to read in the model .xml file. I got stuck here for quite a long time until I realized that I was missing a compiler argument that meant that my binary was incompatible with the OpenVINO inference engine. This was fixed by adding the following line to the Raspbian .pro file:

QMAKE_CXXFLAGS += -march=armv7-a

Once that was added, the code worked perfectly. To test, I set up a simple rt-ai design:


For this test, the CSSDPi SPE was the only thing running on the Pi itself (rtai1), the other two SPEs were running on a PC (default). The incoming captured frames from the webcam to the CSSDPi SPE were 1280 x 720 at 30fps. The CSSDPi SPE was able to process 17 frames per second, not at all bad for a Raspberry Pi 3 model B! Incidentally, I had tried a similar setup using the Coral Edge TPU device and its version of the SSD SPE, CoralSSD, but the performance was nowhere near as good. One obvious difference is that CoralSSD is a Python SPE because, at that time, the C++ API was not documented. One day I may change this to a C++ SPE and then the comparison will be more representative.

Of course you can use multiple NCS 2s to get better performance if required although I haven’t tried this on the Pi as yet. Still, the same can be done with Coral with suitable code. In any case, rt-ai has the Scaler SPE that allows any number of edge inference devices on any number of hosts to be used together to accelerate processing of a single flow. I have to say, the ability to use rt-ai and rtaiDesigner to quickly deploy distributed stream processing networks to heterogeneous hosts is a lot of fun!

The motivation for all of this is to move from x86 processors with big GPUs to Raspberry Pis with edge inference accelerators to save power. The driveway project has been running for months now, heating up the basement very nicely. Moving from YOLOv3 on a GTX 1080 to MobileNet SSD and a Coral edge TPU saved about 60W, moving the entire thing from that system to the Raspberry Pi has probably saved a total of 80W or so.

This is the design now running full time on the Pi:


CPU utilization for the CSSDPi SPE is around 21% and it uses around 23% of the RAM. The raw output of the CSSDPi SPE is fed through a filter SPE that only outputs a message when a detection has passed certain criteria to avoid false alarms. Then, I get an email with a frame showing what triggered the system. The View module is really just for debugging – this is the kind of thing it displays:


The metadata displayed on the right is what the SSDFilter SPE uses to determine whether the detection should be reported or not. It requires a configurable number of sequential frames with a similar detection (e.g. car rather than something else) over a configurable confidence level before emitting a message. Then, it has a hold-off in case the detected object remains in the frame for a long time and, even then, requires a defined gap before that detection is re-armed. It seems to work pretty well.

One advantage of using CSSD rather than CYOLO as before is that, while I don’t get specific messages for things like a USPS van, it can detect a wider range of objects:


Currently the filter only accepts all the COCO vehicle classes and the person class while rejecting others, all in the interest of reducing false detection messages.

I had expected to need a Raspberry Pi 4 (mine is on its way đŸ™‚ ) to get decent performance but clearly the Pi 3 is well able to cope with the help fo the NCS 2.