Integrating TensorFlow object detection into rt-ai Edge

I have been using DeepLabv3 for a while now for object detection but I thought it would be interesting to try some examples from the TensorFlow object detection repo. I now have an rt-ai Edge stream processing element that is based on the Jupyter notebook example in the repo. Presumably this will work with any of the models in the model zoo although I am just using the default one for now.

As you can see from the capture above (apart from the nasty looking grass on the left) it picks out the car happily, although not with a great confidence level. Maybe it doesn’t like the elevated camera position or the car is a bit too far away or a difficult pose – I will need to do some more experiments. Also I am only get 1 fps with 1280 x 720 frames from the camera which is a little disappointing, even though I did the usual optimizations to avoid reloading the graph for every frame and things like that.

The capture above shows the raw image along with the object recognition data in the form of metadata rather than drawn on the image. This is actually pretty useful for both real-time and offline processing (such as a machine learning run). Capturing the original image does have the advantage that alternate object detectors could be run at any time, at the expense of having to store more data. Real-time actions can be based on the metadata and the raw image just discarded.

Anyway, definitely a work in progress. It will be interesting to see how it compares with the DeepLabv3 version as the implementation gets more efficient. What’s nice is that it is trivial to swap out one object detector for another or run them in parallel in order to run tests. Just takes a few seconds with the rtaiDesigner GUI.

The new RTSP SPE: bringing H.264 video streams from ONVIF cameras into rt-ai Edge designs

Most IP cameras, including security and surveillance cameras, support RTSP H.264 streaming so it made sense to implement a compatible stream processing element (SPE) for rt-ai Edge. The design above is a simple test design. The video stream from the camera is converted into JPEG frames using GStreamer within the SPE and then passed to the DeepLabv3 SPE. The output from DeepLabv3 is then passed to a MediaView SPE for display.

I have a few ONVIF/RTSP cameras around the property and the screen capture above shows the results from one of these. There’s a car sitting in its field of view that’s picked out very nicely. I am using the DeepLabv3 SPE here in its masked image mode where the output frames just consist of recognized object images and nothing else. Just for reference, this is the original frame:

Clearly the segmented image only retains what it is important for later processing.

rtaiView: an rt-ai app for viewing real-time and historic sensor data

I am now pulling things together so that I can use the ZeroSensors to perform long-term data collection. Data generated by the rt-ai Edge design is passed into the Manifold and then captured by ManifoldStore, one of the standard Manifold nodes. Obviously it would be nice to know that meaningful data is being stored and that’s where rtaiView comes in. The screen capture above shows the real-time display when it has been configured to receive streams from the video and data components of the ZeroSensor streams. This is showing the streams from a couple of ZeroSensors but more can be added and the display adjusts accordingly.

This is the simple ZeroSpace design as seen in the rtaiDesigner editor window. The hardware setup consists of the ZeroSensors running the SensorZero synth stream processor element (SPE) and a server running the DeepLabv3 SPEs and the ManifoldZero synths. The ManifoldZero synths consist of a couple of PutManifold SPEs that take each stream from the ZeroSensor and map it to a Manifold stream.

ManifoldStore captures these streams and persists them to disk as can be seen from the screen capture above.

This allows rtaiView to display the real-time data coming from the ZeroSensors and historic data based on timecode.

The screen capture above shows rtaiView in historic (or DVR) mode. The control widget (at the top right) allows the user to scan through periods of time and visualize the data. The same timecode is used for all streams displayed, making it easy to correlate events between them.

rtaiView is a useful tool for checking that the rt-ai Edge design is operating correctly and that the data stored is useful. In these examples, I have set DeepLabv3 to color map recognized objects. However, this is not the desired mode as I just want to store images that have people detected in them and then have the images only contain the people. The ultimate goal is to use these image sequences along with other sensor data to detect anomalous behavior and also to predict actions so that the rt-ai Edge enabled sentient space can be proactive in taking actions.

ZeroSensor case design

It has taken a while to get to this point but, now the focus is back on rt-ai Edge, it is time to get the ZeroSensors sorted out properly. The design above is the prototype 3D printed case. It’s a free standing case, about 3 inches by 2.7 inches and 1.4 inches deep. The biggest problem with these things is getting thermal isolation so that the temperature reading is from the outside air rather than Raspberry Pi Zero heated air. The big baffle on the rear (on the right of the image above) is intended to keep the air separate in the two halves. The little slot is to allow four thin cables to run between the Pi and the sensor boards. Right now the back has no holes so that air flow is fully bottom to top convection on both the sensor side and the Pi side. However, this might need to be changed if the initial design doesn’t work. The plastic material will conduct heat so it may be necessary to add more thermal isolation using holes or slots in the back.

An rt-xr SpaceObjects tour de force

rt-xr SpaceObjects are now working very nicely. It’s easy to create, configure and delete SpaceObjects as needed using the menu switch which has been placed just above the light switch in my office model above.

The video below shows all of this in operation.

The typical process is to instantiate an object, place and size it and then attach it to a Manifold stream if it is a Proxy Object. Persistence, sharing and collaboration works for all relevant SpaceObjects across the supported platforms (Windows and macOS desktop, Windows MR, Android and iOS).

This is a good place to leave rt-xr for the moment while I wait for the arrival of some sort of AR headset in order to support local users of an rt-xr enhanced sentient space. Unfortunately, Magic Leap won’t deliver to my zip code (sigh) so that’s that for the moment. Lots of teasers about the HoloLens 2 right now and this might be the best way to go…eventually.

Now the focus moves back to rt-ai Edge. While this is working pretty well, it needs to have a few bugs fixed and also add some production modes (such as auto-starting SPNs when server nodes are started). Then begins the process of data collection for machine learning. ZeroSensors will collect data from each monitored room and this will be saved by ManifoldStore for later use. The idea is to classify normal and abnormal situations and also to be proactive in responding to the needs of occupants of the sentient space.

Latest rt-xr toy: shared virtual whiteboards

Since the sticky note idea now works, I thought that it would be fun to do a freehand version – a virtual whiteboard. It’s working pretty reasonably now. I placed a big whiteboard in my virtual office as you can see above to show how two or more occupants of the space can work together on a shared virtual whiteboard. The video below shows how this works.

The screen on the left is the desktop rt-xrViewer app, the screen on the left is the Mixed Reality Portal showing the Windows Mixed Reality rt-xrViewer app. The mouse is used to draw on the whiteboard in the desktop app (blue lines), motion controllers are used for the WMR app (red lines).

This also shows the new interaction rays. They sort of emanate from where the nose of the avatar should be.

They help give a sense of what the virtual occupants are doing. Otherwise, writing on the whiteboard seems a bit ghostly.

Whiteboards are actually proxy objects, driven from a special server that’s part of the SharingServer. The whiteboard itself is a completely dumb graphical asset. This makes it ideal for packaging as a Unity assetbundle and downloading at runtime rather than having to be built into the app. The required standard scripts included with rt-xrViewer are attached after a proxy object is instantiated.

This is the first time that proxy objects have supported interaction, opening the door to more interesting proxy objects in the future.

rt-xr SpaceObject sharing and persistence demo

SpaceObjects are dynamic objects that can be created, manipulated and deleted within the sentient space. The sticky note SpaceObject is the perfect vehicle for demonstrating these capabilities, as shown in the video below (which would have been even better if the camera had been exactly horizontal but, oh well). The monitor on the left is showing the rt-xrViewer app for Windows desktop, the one on the right is the Mixed Reality Portal showing the rt-xrViewer app for Windows Mixed Reality. I was wearing the WMR headset and using a motion controller to interact with the space. Right now you can create a sticky note, position it, add and edit the text and also delete it. In fact, any occupant of the space, physical or virtual, can edit the text if they want (obviously permissions for all of this is a TODO). Any number of sticky notes can be created and left around the space as a sort of virtual graffiti.

It’s a little tough to see but, as the text is being edited on the WMR app, the text is changing in real-time on the desktop app. Not totally necessary but kind of amusing to watch.

SpaceObject sharing is performed using the SpaceServer while the SharingServer provides the avatar pose sharing and audio sharing as before. Of course this all works on macOS, Android and iOS so that any reasonable device can participate. And of course AR and MR headset users can interact with SpaceObjects. The SpaceServer is able to make persistent all salient settings for each SpaceObject. All SpaceObjects persist position, rotation, and scale. In the case of the sticky note, it also includes the current text. Any occupant coming into an existing session will get the latest space state when they receive the space definition from the SpaceServer and from then on they will receive real-time updates of any changes.

These latest capabilities, coupled with the spatialized audio sharing, create a quite nice collaborative environment. Next up is the ability to download SpaceObjects on demand from object servers. Since SpaceObjects can also be proxy objects, this opens the door to all kinds of active bling to brighten up the space.