The new ZED camera SPE and CYOLO SPE with support for depth cameras


The new SPE for the Stereolabs ZED depth camera is now working nicely, as is the new support for depth data in the CYOLO SPE. The extra depth information can be seen in the metadata display on the right of the screen capture – the annotation on the image itself is still the standard code but, since that is just for testing, it is ok.


This is the design used for testing. The ZED camera SPE has two outputs: one looks like a standard camera output while the other has both left and right images and the depth image. The CYOLO SPE can now accept either standard video messages or depth video messages using the appropriate input pin. The depth image adds about 3.7MB to each message so it isn’t a trivial overhead but the CYOLO module only ever outputs a standard video frame so the large payload is contained in the single link in this design. Even running everything on a busy machine with 1280 x 720 frames, the whole design still runs at around 15fps which is not too bad.

 

Old-school IoT sensing with absolutely no AI

Winter is coming, as one might say. Actually it is here, with freezing temperatures and inches of snow. Time to turn on the heater in the garage. However, if someone leaves a door open it will heat the atmosphere in general and burn through all of the propane. Yes, it would be absolutely trivial to put magnetic sensors on the doors that turn off the heater if the door is open but that is no fun at all. This has actually been a long-running project with all kinds of interesting solutions, some even including Apache NiFi and other big data things. One solution used Insteon and actually allowed the system to automatically close doors. However, various disasters could occur and I turned this system off. In any case, the Insteon door sensors were not reliable.

The root of the problem is that the system really has to understand what is happening in the space, not just the states of various parts of it. Someone could be just about to back a car out when the system decides to shut a door. Not good. Seemingly, a perfect and autonomous solution is still a little way down the road. A short term hack seems in order, however!

Since I am working on rt-ai Edge right now, it was natural to think about how to use rt-ai Edge to solve this problem. I have some ZeroSensors around so I put one of them in the garage. I realized that, when the doors are shut, the garage is very dark and the temperature stable at the thermostat setting (allowing for hysteresis). If a door is open, the garage gets illuminated in one way or another (garage door motor lights, ambient light, outside lights at night etc) and is a somewhat reliable, albeit indirect, indicator of door status. Also, the temperature will drop pretty quickly, giving an indication that either a door is open or the heater has failed.

Given how simple this is, I put together a quick filter SPE to process the light and temperature data from a ZeroSensor and send me an email if the doors are seemingly open for more than a preset time. The screen capture above shows what is going on. There are actually a couple of stream processing networks (SPNs) in action in this space, mainly running on the rtai0 node. One of them is the YOLO-based driveway vehicle detection system while the other is the garage environmental sensing network. A node can be running multiple SPNs at the same time – they are ships in the night and can be managed totally independently.

 

This screen capture shows the configuration dialog for the SensorFilter SPE. I currently also have it set to confirm when the garage doors have been shut so that I know if someone else has taken care of the problem.

Another step would be to turn the heater off if a door is open and the temperature is dropping – right now, I am not planning to allow the door to be shut remotely for the reasons mentioned earlier. At least turning the heater off avoids wasting a load of energy. I can control the heater via Insteon. I would just need another SPE to provide the interface between the filtered alerts and my Insteon server.

Still, one day it would be nice to do this properly. This entails understanding the state of the space – things like are there any people in it? This is not completely trivial as the system can’t always see someone inside a car. However, it can assume (until I get an autonomous vehicle) that if a car is moving or has moved, there must be someone driving and, until the system detects that person leaving the space, it must also assume that it should take no autonomous action. However, it needs to maintain state in order to be sure that there’s nobody in the space (or in a car just outside the space) so that it can take control. I have a feeling that there may be quite a few corner cases with this but it would be fun to try, even if it only simulates trying to close doors.

Stereolabs ZED depth camera with YOLO

The Stereolabs ZED camera is a quite effective way of generating depth-enhanced video streams and it seemed like it was time to get one and integrate it with rt-ai Edge. I have worked with one of these before in a different context and I knew that using the ZED was pretty straightforward.

The screen capture above shows the ZED YOLO C++ example code running. The mug in the shot was a bit too close to the monitor to get picked up and my hand was probably too close in general hence the strange 4.92m depth reading. However, it does seem to work pretty well. It even picked up the image of the monitor on the screen as a monitor.

Just as a note, I did have to modify the main.cpp code to run. At line 49, I had to add a std:: in front of an isfinite() call for some reason. Maybe something odd on my Ubuntu system. Also, to get the standard samples to build, I had to add libxmu-dev as another dependency.

Now comes the task of adding this to rt-ai Edge. I am going to split this into two: the first is to produce a new camera SPE that works with the ZED and outputs the depth image in addition to the normal camera image. Then, the CYOLO SPE will be modified to accept optional depth information and perform the processing to generate the actual object depth value. This seems like a more general solution as the ZED SPE then looks like a standard depth camera while the upgraded CYOLO will be able to work with any depth camera.

Integrating Core ML with Unity on iOS

The latest iPads and iPhones have some pretty serious edge neural network capabilities that are a natural fit with ARKit and Unity. AR and Unity go together quite nicely as AR provides an excellent way of communicating back to the user the results of intelligently processing sensor data from the user, other users and static (infrastructure) sensors in a space. The screen capture above was obtained from code largely based on this repo which integrates Core ML models with Unity. In this case, Inceptionv3 was used. While it isn’t perfect, it does ably demonstrate that this can be done. Getting the plugin to work was quite straightforward – you just have to include the mlmodel file in XCode via the Files -> Add Files menu option rather than dragging the file into the project. The development cycle is pretty annoying as the plugin won’t run in the Unity Editor and compile (on my old Mac Mini) is painfully slow but I guess a decent Mac would do a better job.

This all brings up the point that there seem to be different perceptions of what the edge actually is. rt-ai Edge can be perceived as a local aggregation and compute facility for inference-capable or conventional mobile and infrastructure devices (such as security cameras) – basically an edge compute facility supporting edge devices. A particular advantage of edge compute is that it is possible to integrate legacy devices (such as dumb cameras) into an AI-enhanced system by utilizing edge compute inference capabilities. In a sense, edge compute is a local mini-cloud, providing high capacity compute and inference a short distance in time away from sensors and actuators. This minimizes backhaul and latency, not to mention securing data in the local area rather than dispersing it in a cloud. It can also be very cost-effective when compared to the costs of running multiple cloud CPU instances 24/7.

Given the latest developments in tablets and smart phones, it is essential that rt-ai Edge be able to incorporate inference-capable devices into its stream processing networks. Inference-capable, per user devices make scaling very straightforward as capability increases in direct proportion to the number of users of an edge system. The normal rt-ai Edge deployment system can’t be used with mobile devices which requires (at the very least) framework apps to make use of AI models within the devices themselves. However, with that proviso, it is certainly possible to incorporate smart edge devices into edge networks with rt-ai Edge.

 

Getting email alerts from the YOLOv3-based driveway detection system

The YOLOv3-based driveway detection system is now running full-time to see how workable the system is in real life. The associated rtaiDesigner design looks like this:

It has a new SPE called SendEmail that, well, does exactly that. The YOLOFilter SPE has been modified so that it also attaches a frame from the video captured during the detection. The SendEmail SPE then creates an email with the text message generated by YOLOFilter and attaches the image. The screen capture at the top shows an example of the email that is sent.

SendEmail can queue up messages if they occur at more than a preset rate so that the total email rate is limited. After a timeout, the email sent contains the detections that had been queued up during the hold-off period.

It is also possible to look at the historical data to see what actually transpired. The PutManifold SPE passes the video data and YOLO metadata to ManifoldStore for long-term storage. The rtaiView app can then be used to look back over the data. The screen capture above shows a frame from the same sequence displayed in rtaiView and the associated YOLO metadata. It’s all working quite well, actually.

Texting about what’s coming up the driveway with YOLOv3

Following on from the previous post, I have added a couple of new Stream Processing Elements (SPEs) to the vehicle detection stream processing network design:

YOLOFilter takes the raw detections from YOLOv3 and filters them on a configured confidence level and minimum total area of the frame occupied by the detection box. If the detection passes that, it then gets passed to another filter that accumulates the results and, if a sufficient number of detections occur within a set time, it outputs a configurable text message and JSON message downstream. By noting if the detection box area is growing or shrinking, the filter can also determine if the object is approaching the camera or receding from the camera.

In this design, the text message is passed to an SPE that sends a text message to my phone via Twilio. The JSON message is passed to a PutManifold SPE that makes the message available to Manifold apps such as ManifoldStore for long term storage. Since these are only generated when a significant event is detected, I will be able to use these with rtaiView to quickly skip to the next significant event in the streams in view. It will also make it trivial to generate a “highlight reel” which is a short video consisting of significant events detected during a specified time range.

Next up is an SMTP email SPE so that it is possible to send alerts like these via email as well as SMS texts.

Detecting what’s coming up the driveway with YOLOv3

It is hardly an original desire to want to know who or what is coming up the driveway. As a step along that road (as it were), I used my YOLO workflow to train YOLOv3 on a few things likely to be seen there. With my usual impatience, this test captured above was performed with an early set of weights (at around 1200 iterations) but actually seemed to work reasonably well and was easily able to differentiate between the different vehicle types and makes. Training is continuing again now but it is nice to know that it is going to work. I am training it to detect a range of vehicles, including UPS trucks and mail vans.

One thing I don’t know as yet is the situation with false positives – will random cars and trucks trigger one of the learned classes or not? Time will tell. If so, I’ll probably have to include some negative examples in the training set that includes examples of other types of vehicles that I don’t want to detect. Or, put all these other examples into a new general vehicle class. Not sure which is best at this point.

This is the fairly boring rt-ai Edge design that’s using the new model. It is basically passing the video frames through CYOLO and then pushing out to Manifold where it is being stored and can be viewed in real-time. This is running full time now so I will be able to look back and see how the detection performs in real life. In addition, selected and annotated frames from the stored data can be recycled to add to the training data in a future training cycle.

I could go crazy and use the license reading SPE to be much more specific about the individual vehicles. However, I still don’t have the right sort of cameras to make that work effectively.

Ok, so now that I have YOLO producing metadata indicating what is moving on the driveway, I then need to process that into useful information. That’s going to require a new SPE to process and filter the raw detections so that I can get real-time alerts for interesting events.