I have an application that requires a custom object detector for rt-ai and YOLOv3 seemed liked a good base from which to start. The challenge as always is to capture and prepare suitable training data. I followed the guide here which certainly saved a lot of work. For this test, I used about 50 photos each of the left and right controllers from a Windows MR headset. The result from the rt-ai SPE is shown in the capture above. I was interested to see how well it could determine between the left and right controllers as they are just mirror images of each other. It’s a bit random but not terrible. Certainly it is very good at detecting the presence or absence of controllers, even if it is not sure which one it is. No doubt adding more samples for training would improve this substantially.
The guide I followed to create the training data works but has a number of steps that need to be done correctly and in the right order. I am going to modify the Python code to consolidate this into a smaller number of (hopefully) idiot-proof steps and put the results up on GitHub in case anyone else finds it useful.
One application for rt-ai Edge is ubiquitous sensing leading to sentient spaces – spaces that can interact with people moving through and provide useful functionality, whether learned or programmed. A step on the road to that is the ZeroSensor, four prototypes of which are shown in the photo. Each ZeroSensor consists of a Raspberry Pi Zero W, a Pi camera module v2, an Adafruit BME 680 breakout and an Adafruit TSL2561 breakout. The combination gives a video stream and a sensor stream with light, temperature, pressure, humidity and air quality values. The video stream can be used to derive motion sensing and identification while the other sensors provide a general idea of conditions in the space. Notably missing is audio. Microphone support would be useful for general sensing and I might add that in real devices. A 3D printable case design is underway in order to allow wide-scale deployment.
Voice-based interaction is a powerful way for users to interact with sentient spaces. However, it is assumed that people who want to interact are using an AR headset of some sort which itself provides the audio I/O capabilities. Gesture input would be possible via the ZeroSensor’s camera. For privacy reasons video would not be viewed directly or stored but just used as a source of activity data and interaction.
This is the simple rt-ai design used to test the ZeroSensors. The ZeroSynth modules are rt-ai Edge synth modules that contain SPEs that interface with the ZeroSensor’s hardware and generate a video stream and a sensor data stream. An instance of a video viewer and sensor viewer are connected to each ZeroSynth module.
This is the result of running the ZeroSensor test design, showing a video and sensor window for each ZeroSensor. The cameras are staring at the ceiling because the four sensors were on a table. When the correct case is available, they will be deployed in the corners of rooms in the space.
Now that edge devices with embedded inference support are starting to appear, there’s a need for scalable deployment of software and configuration data to these devices. rt-ai Edge can address this scaling requirement using synth modules. Synth modules are composite elements in a stream processing network (SPN) that combine simpler stream processing elements (SPEs) into more complex structures. The idea is that a synth module can be created that contains the SPEs required for a specific type of embedded edge inference device. This synth module can then be deployed, configured and managed for all instances of this type of edge inference device very easily using the rtaiDesigner tool.
The screen capture above is an example of the output from an SPN that includes two differently configured DeepLab v3+ instances along with associated video and audio capture SPEs. The top level SPN looks like this:
There are two synth modules in the design, both instances of the same underlying synth module:
This simple synth module consists of a video capture SPE, an audio capture SPE and the DeepLab v3+ SPE.
As with standard SPEs, synth modules can be allocated to any node in the rt-ai Edge network. The only limitation at present is that all SPEs in an instance of a synth module must run on the same node. This will be relaxed at later date when automatic SPE placement based on available resources is implemented. A synth module can be instanced multiple times on the same node or different nodes as required. In this example, two instances of the same synth module were placed on the Default node.
Individual instances of a synth module can be configured in the top level design:
In this case, Synth0 is being configured. Note the tabs in the dialog. There is one tab for each SPE in the underlying synth module. SPE dialogs are auto-generated from a JSON spec in the SPE design directory. This makes it very easy to construct a combined dialog when SPEs are used in a synth module. Any design can be turned into a synth module just by pressing the Generate synth module button. The synth module then becomes available in the Add module dialog just like any other SPE.
As designs are completely regenerated every time the Generate design button is pressed, internal changes can be made to the synth module at any time and they will be reflected in top level designs the next time that they are generated.
Right now, synth module designs cannot include synth modules, only standard SPEs. If multi-level synth modules were required, it would be a small extension of the current implementation. For now, the ability to reproduce and configure a standard SPN subnetwork multiple times is sufficient to scale most edge inference applications.
The MQTT-based heart of rt-ai Edge is ideal for constructing stream processing networks (SPNs) that are intended to run continuously. rt-ai Edge tools (such as rtaiDesigner) make it easy to modify and re-deploy SPNs across multiple nodes during the design phase but, once in full time operation, these SPNs just run by themselves. An existing stream processing element (SPE), PutNiFi, allows data from an rt-ai Edge network to be stored and processed by big data tools – using Elasticsearch for example. However, these types of big data tools aren’t always appropriate, especially if low latency access is required as Java garbage collection can cause random delays.
For many applications, much simpler but reliably low latency storage is desirable. The Manifold system already has a storage app, ManifoldStore, that is optimized for timestamp-based searches of historical data. A new SPE called PutManifold allows data from an SPN to flow into a Manifold networking surface. The SPN screen capture above shows two instances of the PutManifold SPE used to transfer audio and video data from the SPN. ManifoldStore grabs passing data and stores it using timestamp as the key. Manifold applications can then access historical data flows using streamId/timestamp pairs. It is particularly simple to coordinate access across multiple data streams. This is very useful when trying to correlate events across multiple data sources at a particular point or window in time.
ManifoldStore is intrinsically schemaless in that it can store anything that consists of a JSON part and a binary data part, as used in rt-ai Edge. A new application called rtaiView is a universal viewer that allows multiple streams of all types to be displayed in a traditional split-screen monitoring format. It uses ManifoldStore for its underlying storage and provides a window into the operation of the SPN.
Manifold is designed to be very flexible with various features that reduce configuration for ad-hoc uses. This makes it very easy to perform offline processing of stored data as and when required which is ideal for offline machine learning applications.
The main reason for rt-ai Edge‘s existence is to reduce large volumes of raw data into much smaller amounts of data with high semantic content. Sometimes this can be acted upon in the local loop (i.e. within the edge space) when that makes sense or low latency is critical. Even if it is, it may still be useful to store the information extracted from the raw streams for later offline processing such as machine learning. Since Apache NiFi has all the required interfaces, it makes sense that rt-ai Edge can pass data into Apache NiFi, using it as a gateway to big data type applications.
For this simple example, I am storing recovered license plate data in Elasticsearch. The screen capture above shows the rt-ai Edge stream processing network (SPN) with the new PutNiFi stream processing element (SPE). PutNiFi transfers any rt-ai message desired into an Apache NiFi instance using MQTT for transport.
This screen capture shows the very simple Apache NiFi design. The ConsumeMQTT processor is used to collect messages from the PutNiFi SPE and then passes these to Elasticsearch for storage. Obviously a lot more could be going on here if required.
I came across OpenALPR a little while ago when thinking about the general problem of enhancing the value of video feeds. It has an easy to use Python binding so it didn’t take very long to create an rt-ai Edge stream processing element (SPE). Actually, the OpenALPR part of it is one line of code – it takes a jpeg from the video stream and adds any recognized plate info as metadata to the output message. The trivial stream processing network in the screen capture above shows its operation as an inline semantic enhancer of a video stream. The OpenALPR SPE only outputs a video frame it if either already contains metadata or else the OpenALPR SPE has added metadata. In this way, multiple recognizers can be applied to the same frame using a pipeline of SPEs.
While I can now see a few private houses starting to sprout specialized license plate reading cameras (which are optimized for this purpose, especially for night operation), I don’t have anything set up as yet so I had to make do with printing car images and waving them in front of a webcam. Seemed to work fine but it would be nice to have a proper setup.
Recognized license plate metadata then becomes another feature that can be used for machine learning and inference within the edge environment – another step on the path to sentient spaces perhaps.
rt-ai Edge is progressing nicely and now supports multi-node operation (i.e. multiple networked servers participating in a processing network) along with real-time monitoring. The screen capture shows a simple processing network where the video feed from a camera is passed through a DeepLab-v3+ stream processing element (SPE) and then on to two separate media viewers. At the top of each SPE block in the Designer window is some text like Cam(Default). Here, Cam is the name given to the SPE while Default is the name of the node (server) on which the SPE is running. In this design there are two nodes, Default and rtai0.
The code underlying the common SPE API communicates with the Designer window and supplies the stats about bytes and messages in and out. Soon, this path will also allow SPE-specific real-time parameter tweaking from the Designer window.
To add a node to the system, it just needs to have all of the prerequisites installed and run a special NodeManager SPE. This also communicates with the Designer and supports SPE deployment and runtime control, activated when the user presses the Deploy design button. Moving an SPE between nodes is just a case of reassigning it, generating the design and then deploying the design again.
The green outlines around each SPE indicate the state of the SPE and the node on which it is running. When it is all green, as in the first screen capture, this indicates that both SPE and node are running. For the second screen capture, I manually terminated the View2 SPE on rtai0. The inner part of the outline has now gone red. This indicates that the node is up but the SPE is down. If the outline is all red, it means that the node is down and not communicating with the Designer.
It’s interesting to note that DeepLab-v3+ is processing around 5 frames per second using a GTX-1080 GPU. The input rate from the camera is 30 frames per second. The processor drops frames while it is still processing an earlier frame, ensuring that queues do not build up and latency is kept to a minimum.