I had intended to be doing something completely different today (working on auto-compiling highlight reels of interesting events generated from the prototype production rt-ai Edge object detection system) but managed to get sidetracked by reading about Darknet-based YOLOv3. As Darknet itself is in C and compiles to a shared library this was a good candidate for a Dockerized stream processing element. I used a cuDNN image from NVIDIA as the base since it provides pretty much everything required – I just had to add in the rt-ai SPE library software and compile Darknet on top of that.
The results are pretty good. The preview above shows some detected objects. I discovered that it could detect toothbrushes which is why I am waving one around. It also did a good job of picking up the second mouse just by my left shoulder. 2fps with 1280 x 720 frame size is a little disappointing but this seems to be due to the Python parts of the code since the C demo provided with the library runs much faster. It is a little faster with preview turned off, however (which would be the production mode anyway).
Speaking of production, it does have a problem as it consumes just over 7GB of memory on my GTX 1080 ti GPU card. This means that one GPU card can’t run two instances simultaneously, unlike with the TensorFlow SSD detector. In fact, I can get two instances of that working on a GTX 1080 card with 8GB total memory.
Just for completeness, this is the design which looks just like the usual test designs. The Docker container is built and pushed to a private Docker registry automatically when the design is generated. The target node then just pulls the image from the registry when the design starts up.
This is the MediaView output showing the metadata. The metadata format is equivalent to that generated by the TensorFlow object detector so that they are completely interchangeable.
Docker containers are a great way of reducing the headaches caused by pre-requisites and software versions when deploying code in general and rt-ai SPEs in particular. So it made sense to add support for SPEs in Docker containers in addition to the existing bare metal SPEs. The screen capture above shows the test design in rtaiDesigner using the Docker containerized version of the existing TensorFlow object detector. It is essentially identical to the bare metal version, just with the object detection SPE replaced with the Dockerized version. The container was based on the TensorFlow GPU image.
SPE code is deployed to nodes as a package that includes start and stop scripts. Normally, the start script is something very simple: a single line kicking off a Python script for example. Docker SPEs use a slightly more complex start script that first tries to pull the required Docker image from a defined registry location and then invokes the container in the required manner (using nvidia-docker if necessary).
No changes were required to the SPE code itself in this case – just customization of the start and stop scripts and I added some files used to build the container and install it in the local registry so that the build and update process is very straightforward. Plus, as this test design shows, bare metal and containerized SPEs can be mixed without limitation as the stream interfaces are identical in all cases.
I’ve certainly been learning a fair bit about Docker lately. Didn’t realize that it is reasonably easy to containerize GUI nodes as well as console mode nodes so now rtnDocker contains scripts to build and run almost every rtndf and Manifold node. There are only a few that haven’t been successfully moved yet. imuview, which is an OpenGL node to view data from IMUs, doesn’t work for some reason. The audio capture node (audio) and the audio part of avview (the video and audio viewer node) also don’t work as there’s something wrong with mapping the audio devices. It’s still possibly to run these outside of a container so it isn’t the end of the world but it is definitely a TODO.
Settings files for relevant containerized nodes are persisted at the same locations as the un-containerized versions making it very easy to switch between the two.
rtnDocker has an all script that builds all of the containers locally. These include:
- manifoldcore. This is the base Manifold core built on Ubuntu 16.04.
- manifoldcoretf. This uses the TensorFlow container as the base instead of raw Ubuntu.
- manifoldcoretfgpu. This uses the TensorFlow GPU-enabled container as the base.
- manifoldnexus. This is the core node that constructs the Manifold.
- manifoldmanager. A management tool for Manifold nodes.
- rtndfcore. The core rtn data flow container built on manifoldcore.
- rtndfcoretf. The core rtn data flow container built on manifoldcoretf.
- rtndfcoretfgpu. The core rtn data flow container built on manifoldcoretfgpu.
- rtndfcoretfcv2. The core rtn data flow container built on rtndfcoretf and adding OpenCV V3.0.0.
- rtndfcoretfgpucv2. The core rtn data flow container built on rtndfcoretfgpu and adding OpenCV V3.0.0.
The last two are good bases to use for anything combining machine learning and image processing in an rtn data flow PPE. The OpenCV build instructions were based on the very helpful example here. For example, the recognize PPE node, an encapsulation of Inception-v3, is based on rtndfcoretfgpucv2. The easiest way to build these is to use the scripts in the rtnDocker repo.
I had obtained some very nice results with OpenFace in a previous project and thought it would be fun to wrap it into an rtndf pipeline processing element (PPE). It’s also a good test to see whether docker containers can be used with rtndf. Turns out they work just fine. OpenFace has some complex dependencies and it is much easier just to pull a docker container than build it locally. One approach would have been to build a new container based on the original bamos/openface but instead facerec uses a bit of a hack involving host directory mapping.
To make it easy to use, there’s a bash script in the rtndf/facerec directory called facerecstart that takes care of the docker command line (which is a bit messy). Of course, in order to recognize faces, the system needs to have been trained. rtndf/facerec includes a modified version of the OpenFace web demo that saves the data from the training in the correct form for facerec. There’s a bash script, trainstart, that starts it going and then a browser and webcam can be used to perform the training.
As with the recognize PPE, facerec can either process the whole frame or just segments that contain motion by using the output from the modet PPE. In fact both recognize and facerec can be used in the same pipeline to get combined recognition:
uvccam -> modet -> facerec -> recognize -> avview
This illustrates one of the nice features of the pipeline concept: metadata and annotation can be added progressively by multiple processing stages, adding significant value to the resulting stream.
The idea of joining together separate, lightweight processing elements to form complex pipelines is nothing new. DirectX and GStreamer have been doing this kind of thing for a long time. More recently, Apache NiFi has done a similar kind of thing but with Java classes. While Apache NiFi does have a lot of nice features, I really don’t want to live in Java hell.
I have been playing with MQTT for some time now and it is a very easy to use publish/subscribe system that’s used in all kinds of places. Seemed like it could be the glue for something…
So that’s really the background for rtnDataFlow or rtndf as it is now called. It currently uses MQTT as its pub/sub infrastructure but there’s nothing too specific there – MQTT could easily be swapped out for something else if required. The repo consists of a number of pipeline processing elements that can be used to do some (hopefully) useful things. The primary language is Python although there’s nothing stopping anything being used provided it has an MQTT client and handles the JSON messages correctly. It will even be able to include pipeline processing elements in Docker containers. This will make deployment of new, complex, pipeline processing elements very simple.
The pipeline processing elements are all joined up using topics. Pipeline processing elements can publish to one or more topics and/or subscribe to one or more topics. Because pub/sub systems are intrinsically multicasting, it’s very easy to process data in multiple ways in parallel (for redundancy, performance or functionality). MQTT also allows pipeline processing elements to be distributed on multiple systems, allowing load sharing and heterogeneous computing systems (where only some machines might be fitted with GPUs for example).
Obviously, tools are required to design the pipelines and also to manage them at runtime. The design aspect will come from an old code generation project. While that actually generates C and Python code from a design that the user inputs via a graphical interface, the rtnDataFlow version will just make sure all topic names and broker addresses line up correctly and then produce a pipeline configuration file. A special app, rtnFlowControl, will run on each system and will be responsible for implementing the pipeline design specified.
So what’s the point of all of this? I’m tired of writing (or reworking) code multiple times for slightly different applications. My goal is to keep the pipeline processing elements simple enough and tightly focused so that the specific application can be achieved by just wiring together pipeline processing elements. There’ll end up being quite a few of these of course and probably most applications will still need custom elements but it’s better than nothing. My initial use of rtnDataFlow will be to assist with experiments to see how machine learning tools can be used with IoT devices to do interesting things.