DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

{
    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
            {
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
            },
            {
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                ]
            },
            {
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
            }
        ]
    },
    
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"
        },

        "Subs" : {
            "VideoIn" : "VideoMJPEG"
        }
    }
}

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates start.sh and stop.sh scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.

Semantic image segmentation with TensorFlow using DeepLab


I have been trying out a TensorFlow application called DeepLab that uses deep convolutional neural nets (DCNNs) along with some other techniques to segment images into meaningful objects and than label what they are. Using a script included in the DeepLab GitHub repo, the Pascal VOC 2012 dataset is used to train and evaluate the model. One of the results is shown above. It has managed to extract some pretty ugly furniture from a noisy background quite nicely. Here are couple more examples:


The software has done a nice job of extracting the foreground objects in another very noisy scene.


The person in the background is picked up pretty nicely here – I didn’t even notice the person at first.

Incidentally, to get the local_test.sh to work on Ubuntu 16.04 I had to change the call to download_and_convert_voc2012.sh to use bash instead of sh otherwise it generated an error. Also, I needed to install cuDNN 7.0.4 for Cuda 9.0 rather than cuDNN 7.1.1 in order to get the Jupyter notebook example operating.

What I would like to do now is to create an rt-ai Edge Stream Processing Element (SPE) based on this code to act as a preprocessor stage in order to isolate and identify salient objects in a video stream in real time. One of my interests is understanding behaviors from video and this could be a valuable component in that pipeline by allowing later stages to focus on what’s important in each frame.