The new RTSP SPE: bringing H.264 video streams from ONVIF cameras into rt-ai Edge designs

Most IP cameras, including security and surveillance cameras, support RTSP H.264 streaming so it made sense to implement a compatible stream processing element (SPE) for rt-ai Edge. The design above is a simple test design. The video stream from the camera is converted into JPEG frames using GStreamer within the SPE and then passed to the DeepLabv3 SPE. The output from DeepLabv3 is then passed to a MediaView SPE for display.

I have a few ONVIF/RTSP cameras around the property and the screen capture above shows the results from one of these. There’s a car sitting in its field of view that’s picked out very nicely. I am using the DeepLabv3 SPE here in its masked image mode where the output frames just consist of recognized object images and nothing else.

rtaiView: an rt-ai app for viewing real-time and historic sensor data

I am now pulling things together so that I can use the ZeroSensors to perform long-term data collection. Data generated by the rt-ai Edge design is passed into the Manifold and then captured by ManifoldStore, one of the standard Manifold nodes. Obviously it would be nice to know that meaningful data is being stored and that’s where rtaiView comes in. The screen capture above shows the real-time display when it has been configured to receive streams from the video and data components of the ZeroSensor streams. This is showing the streams from a couple of ZeroSensors but more can be added and the display adjusts accordingly.

This is the simple ZeroSpace design as seen in the rtaiDesigner editor window. The hardware setup consists of the ZeroSensors running the SensorZero synth stream processor element (SPE) and a server running the DeepLabv3 SPEs and the ManifoldZero synths. The ManifoldZero synths consist of a couple of PutManifold SPEs that take each stream from the ZeroSensor and map it to a Manifold stream.

ManifoldStore captures these streams and persists them to disk as can be seen from the screen capture above.

This allows rtaiView to display the real-time data coming from the ZeroSensors and historic data based on timecode.

The screen capture above shows rtaiView in historic (or DVR) mode. The control widget (at the top right) allows the user to scan through periods of time and visualize the data. The same timecode is used for all streams displayed, making it easy to correlate events between them.

rtaiView is a useful tool for checking that the rt-ai Edge design is operating correctly and that the data stored is useful. In these examples, I have set DeepLabv3 to color map recognized objects. However, this is not the desired mode as I just want to store images that have people detected in them and then have the images only contain the people. The ultimate goal is to use these image sequences along with other sensor data to detect anomalous behavior and also to predict actions so that the rt-ai Edge enabled sentient space can be proactive in taking actions.

An rt-xr SpaceObjects tour de force

rt-xr SpaceObjects are now working very nicely. It’s easy to create, configure and delete SpaceObjects as needed using the menu switch which has been placed just above the light switch in my office model above.

The video below shows all of this in operation.

The typical process is to instantiate an object, place and size it and then attach it to a Manifold stream if it is a Proxy Object. Persistence, sharing and collaboration works for all relevant SpaceObjects across the supported platforms (Windows and macOS desktop, Windows MR, Android and iOS).

This is a good place to leave rt-xr for the moment while I wait for the arrival of some sort of AR headset in order to support local users of an rt-xr enhanced sentient space. Unfortunately, Magic Leap won’t deliver to my zip code (sigh) so that’s that for the moment. Lots of teasers about the HoloLens 2 right now and this might be the best way to go…eventually.

Now the focus moves back to rt-ai Edge. While this is working pretty well, it needs to have a few bugs fixed and also add some production modes (such as auto-starting SPNs when server nodes are started). Then begins the process of data collection for machine learning. ZeroSensors will collect data from each monitored room and this will be saved by ManifoldStore for later use. The idea is to classify normal and abnormal situations and also to be proactive in responding to the needs of occupants of the sentient space.

rt-xr: VR, MR and AR visualization for augmented sentient spaces

It was becoming pretty clear that the Unity/XR parts of rt-ai Edge were taking on a life of their own so they have now been broken out into a new project called rt-xr. rt-ai Edge is an always on, real-time and long-lived stream processing system whereas rt-xr is ideal for ad-hoc networking where components come and go as required. In particular, the XR headsets of real and virtual occupants of a sentient space can come and go on a random basis – the sentient space is persistent and new users just get updated with the current state upon entering the space. In terms of sentient space implementation, rt-ai Edge provides the underlying sensing, intelligent processing and reaction processing (somewhat like an autonomic system) whereas rt-xr provides a more user-orientated system for visualizing and interacting with the space at the conscious level (to keep the analogy going) along with the necessary servers for sharing state, providing object repositories etc.

Functions include:

  • Visualization: A model of the sentient space is used to derive a virtual world for VR headset-wearing occupants of the space and augmentations for MR and AR headset-wearing occupants of the space. The structural model can be augmented with various assets, including proxy objects that provide a UI for remote services.
  • Interaction: Both MR/AR occupants physically within a space can interact with objects in a space while VR users can interact with virtual analogs within the same space for a telepresent experience.
  • Sharing: VR users in a space see avatars representing MR/AR users physically within the space while MR/AR users see avatars representing VR users within the space. Spatially located audio enhances the reality of the shared experience, allowing users to converse in a realistic manner.

rt-xr is based on the Manifold networking surface which greatly simplifies dynamic, ad-hoc architectures, supported by efficient multicast and point to point communication services and easy service discovery.

A key component of rt-xr is the rt-xr SpaceServer. This provides a repository for all augmentation objects and models within a sentient space. The root object is the space definition that models the physical space. This allows a virtual model to be generated for VR users while also locating augmentation objects for all users. When a user first enters a space, either physically or virtually, they receive the space definition file from the rt-xr SpaceServer. Depending on their mode, this is used to generate all the objects and models necessary for the experience. The space definition file can contain references to standard objects in the rt-xr viewer apps (such as video panels) or else references to proxy objects that can be downloaded from the rt-xr SpaceServer or any other server used as a proxy object repository.

The rt-xr SharingServer is responsible for distributing camera transforms and other user state data between occupants of a sentient space allowing animation of avatars representing virtual users in a space. It also provides support for the spatially located audio system.

The rt-xr Viewers are Unity apps that provide the necessary functionality to interact with the rest of the rt-xr system:

  • rt-xr Viewer3D is a Windows desktop viewer.
  • rt-xr ViewerMR is a UWP viewer for Windows Mixed Reality devices.
  • rt-xr ViewerAndroid is a viewer for Android devices.

Sentient space sharing avatars with Windows desktop, Windows Mixed Reality and Android apps

One of the goals of the rt-ai Edge system is that users of the system can use whatever device they have available to interact and extract value from it. Unity is a tremendous help given that Unity apps can be run on pretty much everything. The main task was integration with Manifold so that all apps can receive and interact with everything else in the system. Manifold currently supports Windows, UWP, Linux, Android and macOS. iOS is a notable absentee and will hopefully be added at some point in the future. However, I perceive Android support as more significant as it also leads to multiple MR headset support.

The screen shot above and video below show three instances of the rt-ai viewer apps running on Windows desktop, Windows Mixed Reality and Android interacting in a shared sentient space. Ok, so the avatars are rubbish (I call them Sad Robots) but that’s just a detail and can be improved later. The wall panels are receiving sensor and video data from ZeroSensors via an rt-ai Edge stream processing network while the light switch is operated via a home automation server and Insteon.

Sharing is mediated by a SharingServer that is part of Manifold. The SharingServer uses Manifold multicast and end to end services to implement scalable sharing while minimizing the load on each individual device. Ultimately, the SharingServer will also download the space definition file when the user enters a sentient space and also provide details of virtual objects that may have been placed in the space by other users. This allows a new user with a standard app to enter a space and quickly create a view of the sentient space consistent with existing users.

While this is all kind of fun, the more interesting thing is when this is combined with a HoloLens or similar MR headset. The MR headset user in a space would see any VR users in the space represented by their avatars. Likewise, VR users in a space would see avatars representing MR users in the space. The idea is to get as close to a telepresent experience for VR users as possible without very complex setups. It would be much nicer to use Holoportation but that would require every room in the space has a very complex and expensive setup which really isn’t the point. The idea is to make it very easy and low cost to implement an rt-ai Edge based sentient space.

Still lots to do of course. One big thing is audio. Another is representing interaction devices (pointers, motion controllers etc) to all users. Right now, each app just sends out the camera transform to the SharingServer which then distributes this to all other users. This will be extended to include PCM audio chunks and transforms for interaction devices so that everyone will be able to create a meaningful scene. Each user will receive the audio stream from every other user. The reason for this is that then each individual audio stream can be attached to the avatar for each user giving a spatialized sound effect using Unity capabilities (that’s the hope anyway). Another very important thing is that the apps work differently if they are running on VR type devices or AR/MR type devices. In the latter case, the walls and related objects are not drawn and just the colliders instantiated although virtual objects and avatars will be visible. Obviously AR/MR users want to see the real walls, light switches etc, not the virtual representations. However, they will still be able to interact in exactly the same way as a VR user.

Controlling the real world from the virtual world with Android

Since the ability operate a real light switch from the VR world using Windows Mixed Reality (WMR) is now working, it was time to get to get the same thing working on the Android version of the Unity app – rtAndroidView. This uses the same rt-ai Edge stream processing network and Manifold network as the WMR and desktop versions but the extra trick was to get the interaction working.

The video shows me using the touch screen to navigate around the virtual model of my office and operate the light switch, showing that the Manifold HAServer interface is working, along with the normal video and ZeroSensor interfaces.

This is using the Android device as a VR device. In theory, it should be possible to use ARCore with an AR version of this app but the issue is locking the virtual space to the real space. That will take some experimentation I suspect.

Controlling the real world using Windows Mixed Reality, Manifold, rt-ai Edge and Insteon

Having now constructed a simple walk around model of my office and another room, it was time to start work on the interaction side of things. I have an Insteon switch controlling some of the lights in my office and this seemed like an obvious target. Manifold now has a home automation server app (HAServer) based on one from an earlier project. This allows individual Insteon devices to be addressed by user-friendly names using JSON over Manifold’s end to end datagram service. Light switches can now be specified in the Unity rtXRView space definition file and linked to the control interface of the HAServer.

The screen capture above and video below were made using a Samsung Odyssey headset and motion controllers. The light switch specification causes a virtual light switch to be placed, ideally exactly where the real light switch happens to be. Then, by pointing at the light switch with the motion controller and clicking, the light can be turned on and off. The virtual light switch is gray when the light is off and green when it is on. If the real switch is operated by some other means, the virtual light switch will reflect this as the HAServer broadcasts state change updates on a regular basis. It’s nice to see that the light sensor on the ZeroSensor responds appropriately to the light level too. Technically this light switch is a dimmer – setting an intermediate level is a TODO at this point.

An interesting aspect of this is the extent to which a remote VR user can get a sense of telepresence in a space, even if it is just a virtual representation of the real space. To make that connection more concrete, the virtual light in Unity should reflect the ambient light level as measured by the ZeroSensor. That’s another TODO…

While this is kind of fun in the VR world, it could actually be interesting in the AR world. If the virtual light switch is placed correctly but is invisible (apart from a collider), a HoloLens user (for example) could look at a real light switch and click in order to change the state of the switch. Very handy for the terminally lazy! More useful than just this would be to annotate the switch with what it controls. For some reason, people in this house never seem to know which light switch controls what so this feature by itself would be quite handy.