An rt-xr SpaceObjects tour de force

rt-xr SpaceObjects are now working very nicely. It’s easy to create, configure and delete SpaceObjects as needed using the menu switch which has been placed just above the light switch in my office model above.

The video below shows all of this in operation.

The typical process is to instantiate an object, place and size it and then attach it to a Manifold stream if it is a Proxy Object. Persistence, sharing and collaboration works for all relevant SpaceObjects across the supported platforms (Windows and macOS desktop, Windows MR, Android and iOS).

This is a good place to leave rt-xr for the moment while I wait for the arrival of some sort of AR headset in order to support local users of an rt-xr enhanced sentient space. Unfortunately, Magic Leap won’t deliver to my zip code (sigh) so that’s that for the moment. Lots of teasers about the HoloLens 2 right now and this might be the best way to go…eventually.

Now the focus moves back to rt-ai Edge. While this is working pretty well, it needs to have a few bugs fixed and also add some production modes (such as auto-starting SPNs when server nodes are started). Then begins the process of data collection for machine learning. ZeroSensors will collect data from each monitored room and this will be saved by ManifoldStore for later use. The idea is to classify normal and abnormal situations and also to be proactive in responding to the needs of occupants of the sentient space.

The ZeroSensor – a sentient space point of presence

One application for rt-ai Edge is ubiquitous sensing leading to sentient spaces – spaces that can interact with people moving through and provide useful functionality, whether learned or programmed. A step on the road to that is the ZeroSensor, four prototypes of which are shown in the photo. Each ZeroSensor consists of a Raspberry Pi Zero W, a Pi camera module v2, an Adafruit BME 680 breakout and an Adafruit TSL2561 breakout. The combination gives a video stream and a sensor stream with light, temperature, pressure, humidity and air quality values. The video stream can be used to derive motion sensing and identification while the other sensors provide a general idea of conditions in the space. Notably missing is audio. Microphone support would be useful for general sensing and I might add that in real devices. A 3D printable case design is underway in order to allow wide-scale deployment.

Voice-based interaction is a powerful way for users to interact with sentient spaces. However, it is assumed that people who want to interact are using an AR headset of some sort which itself provides the audio I/O capabilities. Gesture input would be possible via the ZeroSensor’s camera. For privacy reasons video would not be viewed directly or stored but just used as a source of activity data and interaction.

This is the simple rt-ai design used to test the ZeroSensors. The ZeroSynth modules are rt-ai Edge synth modules that contain SPEs that interface with the ZeroSensor’s hardware and generate a video stream and a sensor data stream. An instance of a video viewer and sensor viewer are connected to each ZeroSynth module.

This is the result of running the ZeroSensor test design, showing a video and sensor window for each ZeroSensor. The cameras are staring at the ceiling because the four sensors were on a table. When the correct case is available, they will be deployed in the corners of rooms in the space.

Scaling embedded edge inference with rt-ai Edge synth modules

Now that edge devices with embedded inference support are starting to appear, there’s a need for scalable deployment of software and configuration data to these devices. rt-ai Edge can address this scaling requirement using synth modules. Synth modules are composite elements in a stream processing network (SPN) that combine simpler stream processing elements (SPEs) into more complex structures. The idea is that a synth module can be created that contains the SPEs required for a specific type of embedded edge inference device. This synth module can then be deployed, configured and managed for all instances of this type of edge inference device very easily using the rtaiDesigner tool.

The screen capture above is an example of the output from an SPN that includes two differently configured DeepLab v3+ instances along with associated video and audio capture SPEs. The top level SPN looks like this:

There are two synth modules in the design, both instances of the same underlying synth module:

This simple synth module consists of a video capture SPE, an audio capture SPE and the DeepLab v3+ SPE.

As with standard SPEs, synth modules can be allocated to any node in the rt-ai Edge network. The only limitation at present is that all SPEs in an instance of a synth module must run on the same node. This will be relaxed at later date when automatic SPE placement based on available resources is implemented. A synth module can be instanced multiple times on the same node or different nodes as required. In this example, two instances of the same synth module were placed on the Default node.

Individual instances of a synth module can be configured in the top level design:

In this case, Synth0 is being configured. Note the tabs in the dialog. There is one tab for each SPE in the underlying synth module. SPE dialogs are auto-generated from a JSON spec in the SPE design directory. This makes it very easy to construct a combined dialog when SPEs are used in a synth module. Any design can be turned into a synth module just by pressing the Generate synth module button. The synth module then becomes available in the Add module dialog just like any other SPE.

As designs are completely regenerated every time the Generate design button is pressed, internal changes can be made to the synth module at any time and they will be reflected in top level designs the next time that they are generated.

Right now, synth module designs cannot include synth modules, only standard SPEs. If multi-level synth modules were required, it would be a small extension of the current implementation. For now, the ability to reproduce and configure a standard SPN subnetwork multiple times is sufficient to scale most edge inference applications.

Real time edge inference monitoring with rt-ai Edge

rt-ai Edge is progressing nicely and now supports multi-node operation (i.e. multiple networked servers participating in a processing network) along with real-time monitoring. The screen capture shows a simple processing network where the video feed from a camera is passed through a DeepLab-v3+ stream processing element (SPE) and then on to two separate media viewers. At the top of each SPE block in the Designer window is some text like Cam(Default). Here, Cam is the name given to the SPE while Default is the name of the node (server) on which the SPE is running. In this design there are two nodes, Default and rtai0.

The code underlying the common SPE API communicates with the Designer window and supplies the stats about bytes and messages in and out. Soon, this path will also allow SPE-specific real-time parameter tweaking from the Designer window.

To add a node to the system, it just needs to have all of the prerequisites installed and run a special NodeManager SPE. This also communicates with the Designer and supports SPE deployment and runtime control, activated when the user presses the Deploy design button. Moving an SPE between nodes is just a case of reassigning it, generating the design and then deploying the design again.

The green outlines around each SPE indicate the state of the SPE and the node on which it is running. When it is all green, as in the first screen capture, this indicates that both SPE and node are running. For the second screen capture, I manually terminated the View2 SPE on rtai0. The inner part of the outline has now gone red. This indicates that the node is up but the SPE is down. If the outline is all red, it means that the node is down and not communicating with the Designer.

It’s interesting to note that DeepLab-v3+ is processing around 5 frames per second using a GTX-1080 GPU. The input rate from the camera is 30 frames per second. The processor drops frames while it is still processing an earlier frame, ensuring that queues do not build up and latency is kept to a minimum.

DeepLabv3+ Stream Processing Element (SPE) for rt-ai Edge

Integrating DeepLabv3+ with rt-ai Edge turned out to be pretty straightforward and follows from an existing TensorFlow-based Inception Stream Processing Element (SPE). The screen capture above shows an example of what it can do when given a video stream, where the DeepLab SPE has removed all pixels that aren’t part of recognized objects. This is why I am waving a bottle of beer about (and not because it is after 5pm). The PASCAL VOC dataset on which the model I am using has been trained can recognize a finite set of categories of objects. Waving a cow about didn’t seem practical hence the bottle. This is the original frame from the camera:

The DeepLab SPE also allows a specific category to be selected. In the case of the capture below, this was just the bottle:

On the right hand side of the media viewer screen you can see the metadata that has been generated by the DeepLab SPE. This is an example of how rt-ai Edge SPEs can be used to enhance the semantic content of data – video frames in this case.

It is pretty easy to configure the DeepLab SPE using rtaiDesigner:

This is the design screen showing the fairly trivial flow used for this test. Cam is a webcam capture SPE, Audio is an audio capture SPE. The DeepLab SPE is connected in the flow between the capture SPE and the media view SPE.

An interesting feature of rt-ai Edge is how SPEs can be configured. An SPE consists of some code (Python scripts in these cases) and a module spec (mspec) file. The mspec file contains information about subscriber and publisher ports as well as a section that is used to generate a configuration dialog. An example for the DeepLab SPE module dialog is shown above. This is the mspec file that generated it:

{
    "ModuleType" : "DeepLab",

    "ModuleDialog" : {
        "DialogName" : "DeepLab",
        "DialogDesc" : "Settings dialog for DeepLab semantic segmentation",

        "DialogData" : [
            {
                "VarName" : "OutputFormat",
                "VarDesc" : "Output frame format",
                "VarType" : "ConfigSelection",
                "VarValue" : "0",
                "VarStringArray" : [{ "VarEntry" : "Color map"},{"VarEntry" : "Masked image" },{"VarEntry" : "Single category masked image" }]
            },
            {
                "VarName" : "Category",
                "VarDesc" : "Single category selector",
                "VarType" : "ConfigSelection",
                "VarValue" : "15",
                "VarStringArray" : [
                    {"VarEntry" : "background"},
                    {"VarEntry" : "aeroplane"},
                    {"VarEntry" : "bicycle" },
                    {"VarEntry" : "bird" },
                    {"VarEntry" : "boat" },
                    {"VarEntry" : "bottle" },
                    {"VarEntry" : "bus" },
                    {"VarEntry" : "car" },
                    {"VarEntry" : "cat" },
                    {"VarEntry" : "chair" },
                    {"VarEntry" : "cow" },
                    {"VarEntry" : "diningtable" },
                    {"VarEntry" : "dog" },
                    {"VarEntry" : "horse" },
                    {"VarEntry" : "motorbike" },
                    {"VarEntry" : "person" },
                    {"VarEntry" : "pottedplant" },
                    {"VarEntry" : "sheep" },
                    {"VarEntry" : "sofa" },
                    {"VarEntry" : "train" },
                    {"VarEntry" : "tv" }
                ]
            },
            {
                "VarName" : "Preview",
                "VarDesc" : "Enable preview",
                "VarType" : "ConfigBool",
                "VarValue" : "false"
            }
        ]
    },
    
    "ModulePubSubs" : {
        "Pubs" : {
            "VideoOut" : "VideoMJPEG"
        },

        "Subs" : {
            "VideoIn" : "VideoMJPEG"
        }
    }
}

This makes it very easy to try out different settings. Use the module’s dialog to change something, regenerate the design using the Generate design button and then restart the network. Right now, for testing, rtaiDesigner generates start.sh and stop.sh scripts that can be used to quickly implement changes. Hopefully, in the future, configuration changes will be possible on the fly without having to restart the stream processing network.

rt-ai: real time stream processing and inference at the edge enables intelligent IoT

The “rt” part of rt-ai doesn’t just stand for “richardstech” for a change, it also stands for “real-time”. Real-time inference at the edge will allow decision making in the local loop with low latency and no dependence on the cloud. rt-ai includes a flexible and intuitive infrastructure for joining together stream processing pipelines in distributed, restricted processing power environments. It is very easy for anyone to add new pipeline elements that fully integrate with rt-ai pipelines. This leverages some of the concepts originally prototyped in rtndf while other parts of the rt-ai infrastructure have been in 24/7 use for several years, proving their intrinsic reliability.

Edge processing and control is essential if there is to be scalable use of intelligent IoT. I believe that dumb IoT, where everything has to be sent to a cloud service for processing, is a broken and unscalable model. The bandwidth requirements alone of sending all the data back to a central point will rapidly become unworkable. Latency guarantees are difficult to impossible in this model. Two advantages of rt-ai (keeping raw data at the edge where it belongs and only upstreaming salient information to the cloud along with minimizing required CPU cycles in power constrained environments) are the keys to scalable intelligent IoT.

DroNet – flying a drone using data from cars and bikes

Fascinating video about a system that teaches a drone to fly around urban environments using data from cars and bikes as training data. There’s a paper here and code here. It’s a great example of leveraging CNNs in embedded environments. I believe that moving AI and ML to the edge and ultimately into devices such as IoT sensors is going to be very important. Having dumb sensor networks and edge devices just means that an enormous amount of worthless data has to be transferred into the cloud for processing. Instead, if the edge devices can perform extensive semantic mining of the raw data, only highly salient information needs to be communicated back to the core, massively reducing bandwidth requirements and also allowing low latency decision making at the edge.

Take as a trivial example a system of cameras that read vehicle license plates. One solution would be to send the raw video back to somewhere for license number extraction. Alternately, if the cameras themselves could extract the data, then only the recognized numbers and letters need to be transferred, along with possibly an image of the plate. That’s a massive bandwidth saving over sending constant compressed video. Even more interesting would be edge systems that can perform unsupervised learning to optimize performance, all moving towards eliminating noise and recognizing what’s important without extensive human oversight.