Optimizing inference engine utilization with multiplexed streams


One of the issues with the GPU-based CYOLO (for example) is that it uses about 8GB of GPU memory meaning that, even on a GTX 1080 ti GPU card, it is only possible to have one instance of the CYOLO SPE on any one GPU card. A way around this is to run multiple streams through a single SPE instance. The architecture of rt-ai Edge always supported fan in (i.e. stream multiplexing) but not fan out (i.e. stream demultiplexing). The new FanOut module solves this problem. The screen capture above shows the new FanOut SPE running with the Intel NCS 2-based CSSD SPE. Video streams from three cameras are multiplexed on the CSSD SPE’s input pin. The multiplexed output is then passed to the FanOut SPE which demultiplexes the composite stream to up to eight individual streams. The screen capture also shows the FanOut configuration dialog – you just enter the source SPE name for the stream to be associated with each output pin.


Since my second NCS 2 has arrived I was able to run the triple NCS configuration shown above. The old NCS didn’t really contribute much in this case – the two NCS 2s were able to get an aggregate throughput of around 26 frames per second. This is shared between the three input streams of course.

The fan in/fan out multiplexing idea fits very well with the NCS 2 as you can just add more NCS 2s (or more likely, a special purpose multiple Myriad X board) to a node to increase aggregate throughput.

SSD object detection using the Neural Compute Stick 2 now has its own rt-ai stream processing element


Turned out to be pretty easy to integrate the ssd_mobilenet_v2_coco model compiled for the Intel NCS 2 into rt-ai Edge. Since it doesn’t use the GPU, I was able to run this and the YOLOv3 SPE on the same machine which is kind of amusing – one YOLOv3 instance tends to chew up most of the GPU memory, unfortunately, so the GPU can’t be shared. I would have liked to have run YOLOv3 on the NCS 2 for direct comparison but could not. The screen capture above shows the MediaView SPE output for both detectors running on the same 1280 x 720 video stream.


This is the design and it is showing the throughput of each detection SPE – 14 fps for the GTX 1080 ti YOLO and 9 fps for the NCS 2 based SSD. Not exactly a fair comparison, however, but still interesting. It would be much better if I had the same model running using a GPU of course. Right now, the GPU-based SPE that can run ssd_mobilenet_v2_coco (and similar models) is Python based and that (not surprisingly) runs a fair bit slower than the compiled C++ versions I am using here.

Running YOLOv3 with OpenVINO on CPU and (not) NCS 2


Since OpenVINO is the software framework for the Neural Compute Stick 2, I thought it would be interesting to get the OpenVINO YOLOv3 example up and running. While the toolkit download does include a number of models, YOLOv3 isn’t one of them. Instead, the model has to be created from a TensorFlow version.

The instructions here describe how to do this. Steps 1 and 2 are fine but it is kind of awkward how the .pb file is generated so I created a new simple script to do this:

# -*- coding: utf-8 -*-

import numpy as np
import tensorflow as tf
from tensorflow.python.framework import graph_io

from yolo_v3 import yolo_v3, load_weights, detections_boxes, non_max_suppression

def load_coco_names(file_name):
    names = {}
    with open(file_name) as f:
        for id, name in enumerate(f):
            names[id] = name
    return names
    
def main(argv):

    classes = load_coco_names("coco.names")

    # placeholder for detector inputs
    inputs = tf.placeholder(tf.float32, [None, 416, 416, 3])

    with tf.variable_scope('detector'):
        detections = yolo_v3(inputs, len(classes), data_format='NHWC')
        load_ops = load_weights(tf.global_variables(scope='detector'), "yolov3.weights")

    boxes = detections_boxes(detections)

    with tf.Session() as sess:
        sess.run(load_ops)
        frozen = tf.graph_util.convert_variables_to_constants(sess, sess.graph_def, ['concat_1'])
        graph_io.write_graph(frozen, './', 'yolo_v3.pb', as_text=False)

if __name__ == '__main__':
    tf.app.run()

This has the important filenames hardcoded – you just need to put yolo_v3.weights and coco.names in the tensorflow-yolo-v3 directory. Run the script above with:

python3 script.py

and the yolo_v3.pb file should be created. Copy this into the model_optimizer directory, set that as the current directory and run:

python3 mo_tf.py --input_model yolo_v3.pb --tensorflow_use_custom_operations_config ./extensions/front/tf/yolo_v3.json --input_shape [1,416,416,3]

The –input_shape parameter is needed as otherwise it blows up due to getting -1 for the mini-batch size. I just forced this to 1 and it was happy.

The result is in yolo_v3.xml and yolo_v3.bin. These can be used with the demo object_detection_demo_yolov3_async and an example output is shown in the screen capture above. Note that it is necessary to run the following:

~/intel/computer_vision_sdk/bin/setupvars.sh

in the same terminal session as the demo will be run in order for CPU mode to work.

By default, the output just annotates the boxes with label numbers rather than readable labels. To get readable labels, copy coco.names to yolo_v3.labels and put it in the same directory as the xml file. One problem is that the label file reader doesn’t handle spaces in the labels. Rather than mess with the code, I just changed the spaces in the yolo_v3.labels file to underlines. Otherwise it thinks a mouse is a donut and a monitor a dog which is a little confusing.

However, what I really wanted to do was to run this on the NCS 2. The model as generated is FP32 and the NCS 2 wants FP16. Adding –data_type FP16 to the mo_tf.py command line fixes that but unfortunately it reports that the NCS 2 doesn’t support the Resample layer which is used by YOLOv3. If I had been smart I would have noticed that the usage info only mentions CPU and GPU :-(. Interestingly, the table of supported layers indicates that both Resample and Interp are supported on MYRIAD so I do not know what is going on here.

I did try changing the offending tf.image_resize_nearest_neighbor call into a tf.image.resize.bilinear call (by editing yolo_v3.py in the tensorflow-yolo-v3 directory). This maps to Interp instead of Resample in the OpenVINO IR.  This worked fine in CPU mode but still failed to run on the NCS 2 except in a different way:


Not sure if that is a bug or intended. Anyway, that seems to be the end of the road with running YOLOv3 on the NCS 2 for the moment at least. However, there are a lot of things that do run on the NCS 2 very nicely. Still, YOLOv3 had started to become my standard way of checking inference things out, just like my strategy of evaluating restaurants by the quality of their Caesar salad – at least in the days when you could still get them!

The new ZED camera SPE and CYOLO SPE with support for depth cameras


The new SPE for the Stereolabs ZED depth camera is now working nicely, as is the new support for depth data in the CYOLO SPE. The extra depth information can be seen in the metadata display on the right of the screen capture – the annotation on the image itself is still the standard code but, since that is just for testing, it is ok.


This is the design used for testing. The ZED camera SPE has two outputs: one looks like a standard camera output while the other has both left and right images and the depth image. The CYOLO SPE can now accept either standard video messages or depth video messages using the appropriate input pin. The depth image adds about 3.7MB to each message so it isn’t a trivial overhead but the CYOLO module only ever outputs a standard video frame so the large payload is contained in the single link in this design. Even running everything on a busy machine with 1280 x 720 frames, the whole design still runs at around 15fps which is not too bad.

 

Old-school IoT sensing with absolutely no AI

Winter is coming, as one might say. Actually it is here, with freezing temperatures and inches of snow. Time to turn on the heater in the garage. However, if someone leaves a door open it will heat the atmosphere in general and burn through all of the propane. Yes, it would be absolutely trivial to put magnetic sensors on the doors that turn off the heater if the door is open but that is no fun at all. This has actually been a long-running project with all kinds of interesting solutions, some even including Apache NiFi and other big data things. One solution used Insteon and actually allowed the system to automatically close doors. However, various disasters could occur and I turned this system off. In any case, the Insteon door sensors were not reliable.

The root of the problem is that the system really has to understand what is happening in the space, not just the states of various parts of it. Someone could be just about to back a car out when the system decides to shut a door. Not good. Seemingly, a perfect and autonomous solution is still a little way down the road. A short term hack seems in order, however!

Since I am working on rt-ai Edge right now, it was natural to think about how to use rt-ai Edge to solve this problem. I have some ZeroSensors around so I put one of them in the garage. I realized that, when the doors are shut, the garage is very dark and the temperature stable at the thermostat setting (allowing for hysteresis). If a door is open, the garage gets illuminated in one way or another (garage door motor lights, ambient light, outside lights at night etc) and is a somewhat reliable, albeit indirect, indicator of door status. Also, the temperature will drop pretty quickly, giving an indication that either a door is open or the heater has failed.

Given how simple this is, I put together a quick filter SPE to process the light and temperature data from a ZeroSensor and send me an email if the doors are seemingly open for more than a preset time. The screen capture above shows what is going on. There are actually a couple of stream processing networks (SPNs) in action in this space, mainly running on the rtai0 node. One of them is the YOLO-based driveway vehicle detection system while the other is the garage environmental sensing network. A node can be running multiple SPNs at the same time – they are ships in the night and can be managed totally independently.

 

This screen capture shows the configuration dialog for the SensorFilter SPE. I currently also have it set to confirm when the garage doors have been shut so that I know if someone else has taken care of the problem.

Another step would be to turn the heater off if a door is open and the temperature is dropping – right now, I am not planning to allow the door to be shut remotely for the reasons mentioned earlier. At least turning the heater off avoids wasting a load of energy. I can control the heater via Insteon. I would just need another SPE to provide the interface between the filtered alerts and my Insteon server.

Still, one day it would be nice to do this properly. This entails understanding the state of the space – things like are there any people in it? This is not completely trivial as the system can’t always see someone inside a car. However, it can assume (until I get an autonomous vehicle) that if a car is moving or has moved, there must be someone driving and, until the system detects that person leaving the space, it must also assume that it should take no autonomous action. However, it needs to maintain state in order to be sure that there’s nobody in the space (or in a car just outside the space) so that it can take control. I have a feeling that there may be quite a few corner cases with this but it would be fun to try, even if it only simulates trying to close doors.

Stereolabs ZED depth camera with YOLO

The Stereolabs ZED camera is a quite effective way of generating depth-enhanced video streams and it seemed like it was time to get one and integrate it with rt-ai Edge. I have worked with one of these before in a different context and I knew that using the ZED was pretty straightforward.

The screen capture above shows the ZED YOLO C++ example code running. The mug in the shot was a bit too close to the monitor to get picked up and my hand was probably too close in general hence the strange 4.92m depth reading. However, it does seem to work pretty well. It even picked up the image of the monitor on the screen as a monitor.

Just as a note, I did have to modify the main.cpp code to run. At line 49, I had to add a std:: in front of an isfinite() call for some reason. Maybe something odd on my Ubuntu system. Also, to get the standard samples to build, I had to add libxmu-dev as another dependency.

Now comes the task of adding this to rt-ai Edge. I am going to split this into two: the first is to produce a new camera SPE that works with the ZED and outputs the depth image in addition to the normal camera image. Then, the CYOLO SPE will be modified to accept optional depth information and perform the processing to generate the actual object depth value. This seems like a more general solution as the ZED SPE then looks like a standard depth camera while the upgraded CYOLO will be able to work with any depth camera.

Getting email alerts from the YOLOv3-based driveway detection system

The YOLOv3-based driveway detection system is now running full-time to see how workable the system is in real life. The associated rtaiDesigner design looks like this:

It has a new SPE called SendEmail that, well, does exactly that. The YOLOFilter SPE has been modified so that it also attaches a frame from the video captured during the detection. The SendEmail SPE then creates an email with the text message generated by YOLOFilter and attaches the image. The screen capture at the top shows an example of the email that is sent.

SendEmail can queue up messages if they occur at more than a preset rate so that the total email rate is limited. After a timeout, the email sent contains the detections that had been queued up during the hold-off period.

It is also possible to look at the historical data to see what actually transpired. The PutManifold SPE passes the video data and YOLO metadata to ManifoldStore for long-term storage. The rtaiView app can then be used to look back over the data. The screen capture above shows a frame from the same sequence displayed in rtaiView and the associated YOLO metadata. It’s all working quite well, actually.