CYOLO – a pure C++ implementation of a YOLOv3 SPE for rt-ai

The Python-based YOLOv3 SPE has been working for a while now but the performance was a little disappointing at 2 or 3 fps using 1280 x 720 frames on an i7 5820K CPU/GTX 1080ti GPU machine. I was interested to see how much effect the Python code was having on overall performance. To do this I implemented the C++ rt-ai SPE API and added the C version of the YOLOv3 demo code. The result is shown above and this version now runs at just over 14 fps (17 fps at 640 x 480) which is very usable.

While Python is very convenient, it is clearly (and unsurprisingly) more efficient to use C/C++ so I will probably do that in the future where possible. The main side-effect is that rtaiDesigner has to deploy the correct compiled SPE for the target node (typically x64 or ARM) and that any shared libraries that are not part of the standard install are included too. A Dockerized version would of course solve the dependency problem and just require a container for each target architecture.

rt-ai YOLOv2 SPE on a Raspberry Pi using the Movidius Neural Compute Stick

Fresh from success with YOLOv3 on the desktop, a question came up of whether this could be made to work on the Movidius Neural Compute Stick and therefore run on the Raspberry Pi.


The NCS is a neat little device and because it connects via USB, it is easy to develop on a desktop and then transfer everything needed to the Pi.

The app zoo, on the ncsdk2 branch, has a tiny_yolo_v2 implementation that I used as the basis for this. It only took about an hour to get this working on the desktop – integration with rt-ai was very easy. The Raspberry Pi end was not – all kinds of version number issues and things like that. However, even though not all of the tools would compile, I just moved the compiled graph from the desktop to the Pi and that worked fine.

This is the design. The main difference here from the usual test designs is that the MYOLO SPE is assigned to node pi34 (the Raspberry Pi) rather than the desktop (Default). Just assigning the MYOLO SPE to the Pi saved me from having to connect a Picam or uvc camera to the Pi and also allowed me to get a better feel for the pure performance of the Pi with the NCS.

As can be seen from the first screen capture it worked fine although, because it supports only a subset (20 of 91) of the usual COCO labels, it did not pick up the mouse or the keyboard. Performance-wise, it was running at about 1fps and 30% CPU. Just for reference, I was getting about 8fps on the i7 desktop.

Dockerized YOLOv3 rt-ai SPE = YAOD (yet another object detector)

I had intended to be doing something completely different today (working on auto-compiling highlight reels of interesting events generated from the prototype production rt-ai Edge object detection system) but managed to get sidetracked by reading about Darknet-based YOLOv3.  As Darknet itself is in C and compiles to a shared library this was a good candidate for a Dockerized stream processing element. I used a cuDNN image from NVIDIA as the base since it provides pretty much everything required – I just had to add in the rt-ai SPE library software and compile Darknet on top of that.

The results are pretty good. The preview above shows some detected objects. I discovered that it could detect toothbrushes which is why I am waving one around. It also did a good job of picking up the second mouse just by my left shoulder. 2fps with 1280 x 720 frame size is a little disappointing but this seems to be due to the Python parts of the code since the C demo provided with the library runs much faster. It is a little faster with preview turned off, however (which would be the production mode anyway).

Speaking of production, it does have a problem as it consumes just over 7GB of memory on my GTX 1080 ti GPU card. This means that one GPU card can’t run two instances simultaneously, unlike with the TensorFlow SSD detector. In fact, I can get two instances of that working on a GTX 1080 card with 8GB total memory.


Just for completeness, this is the design which looks just like the usual test designs. The Docker container is built and pushed to a private Docker registry automatically when the design is generated. The target node then just pulls the image from the registry when the design starts up.


This is the MediaView output showing the metadata. The metadata format is equivalent to that generated by the TensorFlow object detector so that they are completely interchangeable.

DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

{
    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
            {
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
            },
            {
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                ]
            },
            {
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
            }
        ]
    },
    
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"
        },

        "Subs" : {
            "VideoIn" : "VideoMJPEG"
        }
    }
}

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates start.sh and stop.sh scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.

Latest fun thing in the office: a Movidius Neural Compute Stick

A Movidius Neural Compute Stick just turned up in a delightfully retro style box. Won’t have time to do anything with it until the weekend unfortunately but very interested to see what it can do. It’s another enabler of the movement to add inference to low power mobile devices without relying on a cloud server.

Convolutional recurrent neural network for video prediction and unsupervised learning

Very interesting work here that uses recurrent neural network ideas to predict next frames in a video sequence. It’s amazing how many times LSTM pops up these days. Unsupervised learning is one of the most interesting areas of machine learning at the moment and the potential is seemingly unlimited. This is another example of using LSTM for understanding video representations using LSTM. It’s a fascinating area.