As I had discovered, one Neural Compute Stick 2 (NCS 2) has pretty decent throughput. The question then is: what happens if you connect more than one of these to the same machine? I only have one NCS 2 and one of the older NCS devices to test this out but that combination worked ok with some tuning. OpenVINO manages allocation of requests to physical devices so there is no explicit way for this to be controlled via the API. However, it appears that multiple SPEs on the same node can be supported as then the NCSs are divided up between the SPEs. A reset error message is typically emitted but then everything seems to work fine.
To get the best performance, I ran in async mode using multiple ExecutableNetwork/InferRequest pairs, with the actual number being configurable from the rtaiDesigner GUI. In this case, 5 pairs gave the best results. The throughput is around 18 frames per second running ssd_mobilenet_v2_coco object detection.
Using one NCS at a time, the NCS 2 was able to process 12 frames per second (versus 9 frames per second in synchronous mode using the original SPE code) while the older NCS was able to process 6 frames per second, suggesting that both were being fully utilized.
Now I need to get a second NCS 2…
I had more luck running the ssd_mobilenet_v2_coco model from the TensorFlow model detection zoo on the NCS 2 than I did with YOLOv3. To convert from the .pb file to the OpenVINO-friendly files I used:
python3 mo_tf.py --input_model ssdmv2.pb --tensorflow_use_custom_operations_config ./extensions/front/tf/ssd_v2_support.json --tensorflow_object_detection_api_pipeline_config ssdmv2_pipeline.config --data_type FP16
In this case, I had renamed the frozen_instance_graph.pb from the download as ssdmv2.pb and renamed the pipeline.config file from the download as ssdmv2_pipeline.config. The screen capture above shows the object_detection_demo_ssd_async demo app running with the NCS 2. I didn’t sort out the labels for this test which is why it is just displaying numbers for the detected objects.
I also tried this using the CPU (using –data_type FP32) with this result:
It is worth noting that the video was running at 1920 x 1080 which is a significant challenge for just about anything. The CPU (an i7 5820K) is obviously a fair bit faster than the NCS 2 but a real advantage is the small physical footprint, low price, low power and CPU offload that the Myriad X VPU in the NCS 2 offers.
This iOS app is really step 1 on the road to integrating Core ML enabled iOS devices with rt-ai Edge. The screenshot shows the MobileNet SSD object detector running within the ARKit-enabled Unity app on an iPad Pro. If anyone wants to try this, code is here. I put this together pretty quickly so apologies if it is a bit rough but it is early days. Detection box registration isn’t perfect as you can see (especially for the mouse) but it is not too bad. This is probably a field of view mismatch somewhere and will need to be investigated.
Next, this code needs to be integrated with the Manifold C# Unity client. Following that, I will need to write the PutManifold SPE for rt-ai Edge. When this is done, the video and object detection data stream from the iOS device will appear within an rt-ai Edge stream processing network and look exactly the same as the stream from the CYOLO SPE.
The app is based on two repos that were absolutely invaluable in putting it together:
Many thanks to the authors of those repos.
It is hardly an original desire to want to know who or what is coming up the driveway. As a step along that road (as it were), I used my YOLO workflow to train YOLOv3 on a few things likely to be seen there. With my usual impatience, this test captured above was performed with an early set of weights (at around 1200 iterations) but actually seemed to work reasonably well and was easily able to differentiate between the different vehicle types and makes. Training is continuing again now but it is nice to know that it is going to work. I am training it to detect a range of vehicles, including UPS trucks and mail vans.
One thing I don’t know as yet is the situation with false positives – will random cars and trucks trigger one of the learned classes or not? Time will tell. If so, I’ll probably have to include some negative examples in the training set that includes examples of other types of vehicles that I don’t want to detect. Or, put all these other examples into a new general vehicle class. Not sure which is best at this point.
This is the fairly boring rt-ai Edge design that’s using the new model. It is basically passing the video frames through CYOLO and then pushing out to Manifold where it is being stored and can be viewed in real-time. This is running full time now so I will be able to look back and see how the detection performs in real life. In addition, selected and annotated frames from the stored data can be recycled to add to the training data in a future training cycle.
I could go crazy and use the license reading SPE to be much more specific about the individual vehicles. However, I still don’t have the right sort of cameras to make that work effectively.
Ok, so now that I have YOLO producing metadata indicating what is moving on the driveway, I then need to process that into useful information. That’s going to require a new SPE to process and filter the raw detections so that I can get real-time alerts for interesting events.
Following on from the previous post, I have now put together a pretty usable workflow for creating custom YOLOv3 models – the code and instructions are here. There are quite a few alternatives out there already but it was interesting putting this together from a learning point of view. The screen capture above was taken during some testing. I stopped the training early (which is why the probabilities are pretty low) so that I could test the weights with an rt-ai stream processing network design and then restarted the training. The tools automatically generate customized scripts to train and restart training, making this pretty painless.
There is a tremendous amount of valuable information here, including the code for the custom anchor generator that I have integrated into my workflow. I haven’t yet tried this enhanced version of Darknet yet but will do that soon. One thing I did learn from that repo is that there is an option to treat mirror image objects as distinct objects – no doubt that was what was hindering the accurate detection of the left and right motion controllers previously.
I have an application that requires a custom object detector for rt-ai and YOLOv3 seemed liked a good base from which to start. The challenge as always is to capture and prepare suitable training data. I followed the guide here which certainly saved a lot of work. For this test, I used about 50 photos each of the left and right controllers from a Windows MR headset. The result from the rt-ai SPE is shown in the capture above. I was interested to see how well it could determine between the left and right controllers as they are just mirror images of each other. It’s a bit random but not terrible. Certainly it is very good at detecting the presence or absence of controllers, even if it is not sure which one it is. No doubt adding more samples for training would improve this substantially.
The guide I followed to create the training data works but has a number of steps that need to be done correctly and in the right order. I am going to modify the Python code to consolidate this into a smaller number of (hopefully) idiot-proof steps and put the results up on GitHub in case anyone else finds it useful.
I decided that it would be fun to try out a Google AIY Vision Kit as a sort of warm-up for the potentially much more significant Edge TPU.
The Vision Kit is basically the same configuration as the ZeroSensor camera except with an extra board in the camera path that can perform inference on the captured images. The kit comes with some frozen graphs that can be used to detect a few things but I thought it would be interesting to try training a MobileNet SSD network with the Pascal VOC 2012 training data which can identify 20 different objects. The instructions for how to do this are here.
Once that was all running, the next step was to integrate it with rt-ai Edge. It’s pretty similar to the earlier full-blown TensorFlow version so it didn’t take too long to get working.
The design is much the same as usual except with the new VisionKit object detection SPE instead of TFObjectDetect or Deeplab. Note that the PiCam and VisionKit SPEs are running on the AIY Vision Kit, whereas the MediaView SPE is running on a desktop.
This is the output from the MediaView SPE. The metadata has been formatted to look exactly the same as the previous TensorFlow detector so that they can be used interchangeably in stream processing networks. I am getting about 2 fps with 640 x 360 images which is actually better than I expected.