Integrating SHAPE with rt-ai: adding AI to highly augmented spaces

A key feature of SHAPE is its ability to leverage the power of external servers in order to enhance the AR experience. The idea of combining relatively simple and cheap AR headsets with low latency communications links (such as 5G wireless) to edge servers is what is driving SHAPE’s architecture. Giving SHAPE access to rt-ai edge systems is a first example of this in action.

The screen capture above gives an idea of the current state of SHAPE development. This was taken using an iPad Pro running the iOS SHAPE app. The polygons with red edges are the planes that have been detected by ARKit. At the bottom right the monitor shows the same app running on a Mac (in the Unity editor in this case). The macOS version greatly speeds development of everything other than ARKit-related functionality – especially space synchronization functions (e.g. adding, moving, modifying or deleting object actions that need to be shared between all SHAPE users in the same space). The Unity iOS SHAPE app uses the ARFoundation API to, amongst other things,  load and save ARWorldMaps in order to synchronize spatial locations between SHAPE app instances. ARWorldMaps are persisted by the CoreUniverse components and cached for real-time use by EdgeSpace components, one EdgeSpace per physical “room”. SHAPE apps physically entering the room receive the latest map along with the space definition for that room. This includes the directory of augmentation objects with metadata that allows them all to be downloaded from asset servers (unless already cached) and then positioned correctly in the physical space and connected to the appropriate external function servers.

Augmentation objects can be moved around the space manually by touching the object with three or more fingers – sounds awful but it does work. It can then be dragged around the screen and the screen can be moved around to position the objects in space. Touching the object with two fingers brings up the object menu for that instance. This allows the object to be deleted, resized or rotated. It also allows the object to be stuck to a wall or stuck to the floor. in this context, a wall is an ARKit vertical plane, a floor is an ARKit horizontal plane so the object could easily be placed on a table if a suitable plane has been detected. If not, it can be placed manually. All of these object changes are sent to the room’s EdgeSpace (via EdgeAccess) and shared between other users in the space to keep everything synchronized. In addition, updates are sent to CoreUniverse for persistence. These become integrated into the persistent space definition for the room which EdgeSpace instances receive on a regular basis from CoreUniverse (primary and backup). Now this creates an interesting race condition since EdgeSpace is modifying its cached space definition in real-time and it may take a while for the CoreUniverse version to catch up. This problem is handled using timestamps attached to updates so that EdgeSpace can correctly integrate new information from CoreUniverse (such a new object instantiated by a space design tool) while ignoring stale updates for existing objects.

The box with big “M”s is the menu object. Each room has one and it can be placed anywhere convenient in the room. You can click on it (well touch it actually if using an iPad touch screen) and this pops up a menu that allows the user to add augmentation objects. Right now this is just working for the infamous analog clock but will eventually present a catalog of available models with thumbnails. The analog clocks are proxy objects and being driven by an external analog clock server. Obviously it is trivial to implement this purely in the Unity app but it is meant as a simple test of the proxy object concept. The next proxy object to be added will be the sticky note object from rt-xr and then probably the rt-xr shared whiteboard.

Getting back to rt-ai integration, the rt-ai design above shows the simple test design that receives captured frames from the iPad’s rear camera. The frame rate is limited to 5fps so as not to load the WiFi link too much. For simplicity and low latency motion jpegs are used for this but of course compressed video could be used (and probably will be in the future). The new rt-ai SPE called SHAPEConductor looks to the SHAPE system like a SHAPE function server while mapping received messages into and out of an rt-ai stream processing network. In this case, the video is simply being passed through DeepLab to perform semantic segmentation and then the results displayed:


Here it is picking up the monitor running the macOS SHAPE app. In practice, more complex processing would be performed and results returned to proxy objects via the SHAPEConductor module and the SHAPE network.

One interesting application for this is to use the captured frames to recognize the physical space and automatically load the correct saved ARWorldMap for that physical space into the SHAPE app and instantiate all the appropriate augmentation objects, correctly located. Another would be to perform semantic segmentation and return the results to the SHAPE app so that it can be married to depth data and allow real time occlusion to be performed. ARKit 3 will do this on-device for people but apparently not in general. Offloading the segmentation should allow for a lot more flexibility, albeit with increased latency, and work on lower capability devices.

The SHAPE rt-ai integration is very much a work in progress and it will be fun to see what can be achieved with this combination.

The new RTSP SPE: bringing H.264 video streams from ONVIF cameras into rt-ai Edge designs

Most IP cameras, including security and surveillance cameras, support RTSP H.264 streaming so it made sense to implement a compatible stream processing element (SPE) for rt-ai Edge. The design above is a simple test design. The video stream from the camera is converted into JPEG frames using GStreamer within the SPE and then passed to the DeepLabv3 SPE. The output from DeepLabv3 is then passed to a MediaView SPE for display.

I have a few ONVIF/RTSP cameras around the property and the screen capture above shows the results from one of these. There’s a car sitting in its field of view that’s picked out very nicely. I am using the DeepLabv3 SPE here in its masked image mode where the output frames just consist of recognized object images and nothing else. Just for reference, this is the original frame:

Clearly the segmented image only retains what it is important for later processing.

rtaiView: an rt-ai app for viewing real-time and historic sensor data

I am now pulling things together so that I can use the ZeroSensors to perform long-term data collection. Data generated by the rt-ai Edge design is passed into the Manifold and then captured by ManifoldStore, one of the standard Manifold nodes. Obviously it would be nice to know that meaningful data is being stored and that’s where rtaiView comes in. The screen capture above shows the real-time display when it has been configured to receive streams from the video and data components of the ZeroSensor streams. This is showing the streams from a couple of ZeroSensors but more can be added and the display adjusts accordingly.

This is the simple ZeroSpace design as seen in the rtaiDesigner editor window. The hardware setup consists of the ZeroSensors running the SensorZero synth stream processor element (SPE) and a server running the DeepLabv3 SPEs and the ManifoldZero synths. The ManifoldZero synths consist of a couple of PutManifold SPEs that take each stream from the ZeroSensor and map it to a Manifold stream.

ManifoldStore captures these streams and persists them to disk as can be seen from the screen capture above.

This allows rtaiView to display the real-time data coming from the ZeroSensors and historic data based on timecode.

The screen capture above shows rtaiView in historic (or DVR) mode. The control widget (at the top right) allows the user to scan through periods of time and visualize the data. The same timecode is used for all streams displayed, making it easy to correlate events between them.

rtaiView is a useful tool for checking that the rt-ai Edge design is operating correctly and that the data stored is useful. In these examples, I have set DeepLabv3 to color map recognized objects. However, this is not the desired mode as I just want to store images that have people detected in them and then have the images only contain the people. The ultimate goal is to use these image sequences along with other sensor data to detect anomalous behavior and also to predict actions so that the rt-ai Edge enabled sentient space can be proactive in taking actions.

Scaling embedded edge inference with rt-ai Edge synth modules

Now that edge devices with embedded inference support are starting to appear, there’s a need for scalable deployment of software and configuration data to these devices. rt-ai Edge can address this scaling requirement using synth modules. Synth modules are composite elements in a stream processing network (SPN) that combine simpler stream processing elements (SPEs) into more complex structures. The idea is that a synth module can be created that contains the SPEs required for a specific type of embedded edge inference device. This synth module can then be deployed, configured and managed for all instances of this type of edge inference device very easily using the rtaiDesigner tool.

The screen capture above is an example of the output from an SPN that includes two differently configured DeepLab v3+ instances along with associated video and audio capture SPEs. The top level SPN looks like this:

There are two synth modules in the design, both instances of the same underlying synth module:

This simple synth module consists of a video capture SPE, an audio capture SPE and the DeepLab v3+ SPE.

As with standard SPEs, synth modules can be allocated to any node in the rt-ai Edge network. The only limitation at present is that all SPEs in an instance of a synth module must run on the same node. This will be relaxed at later date when automatic SPE placement based on available resources is implemented. A synth module can be instanced multiple times on the same node or different nodes as required. In this example, two instances of the same synth module were placed on the Default node.

Individual instances of a synth module can be configured in the top level design:

In this case, Synth0 is being configured. Note the tabs in the dialog. There is one tab for each SPE in the underlying synth module. SPE dialogs are auto-generated from a JSON spec in the SPE design directory. This makes it very easy to construct a combined dialog when SPEs are used in a synth module. Any design can be turned into a synth module just by pressing the Generate synth module button. The synth module then becomes available in the Add module dialog just like any other SPE.

As designs are completely regenerated every time the Generate design button is pressed, internal changes can be made to the synth module at any time and they will be reflected in top level designs the next time that they are generated.

Right now, synth module designs cannot include synth modules, only standard SPEs. If multi-level synth modules were required, it would be a small extension of the current implementation. For now, the ability to reproduce and configure a standard SPN subnetwork multiple times is sufficient to scale most edge inference applications.

DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

{
    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
            {
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
            },
            {
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                ]
            },
            {
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
            }
        ]
    },
    
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"
        },

        "Subs" : {
            "VideoIn" : "VideoMJPEG"
        }
    }
}

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates start.sh and stop.sh scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.