Sentient space sharing avatars with Windows desktop, Windows Mixed Reality and Android apps

One of the goals of the rt-ai Edge system is that users of the system can use whatever device they have available to interact and extract value from it. Unity is a tremendous help given that Unity apps can be run on pretty much everything. The main task was integration with Manifold so that all apps can receive and interact with everything else in the system. Manifold currently supports Windows, UWP, Linux, Android and macOS. iOS is a notable absentee and will hopefully be added at some point in the future. However, I perceive Android support as more significant as it also leads to multiple MR headset support.

The screen shot above and video below show three instances of the rt-ai viewer apps running on Windows desktop, Windows Mixed Reality and Android interacting in a shared sentient space. Ok, so the avatars are rubbish (I call them Sad Robots) but that’s just a detail and can be improved later. The wall panels are receiving sensor and video data from ZeroSensors via an rt-ai Edge stream processing network while the light switch is operated via a home automation server and Insteon.

Sharing is mediated by a SharingServer that is part of Manifold. The SharingServer uses Manifold multicast and end to end services to implement scalable sharing while minimizing the load on each individual device. Ultimately, the SharingServer will also download the space definition file when the user enters a sentient space and also provide details of virtual objects that may have been placed in the space by other users. This allows a new user with a standard app to enter a space and quickly create a view of the sentient space consistent with existing users.

While this is all kind of fun, the more interesting thing is when this is combined with a HoloLens or similar MR headset. The MR headset user in a space would see any VR users in the space represented by their avatars. Likewise, VR users in a space would see avatars representing MR users in the space. The idea is to get as close to a telepresent experience for VR users as possible without very complex setups. It would be much nicer to use Holoportation but that would require every room in the space has a very complex and expensive setup which really isn’t the point. The idea is to make it very easy and low cost to implement an rt-ai Edge based sentient space.

Still lots to do of course. One big thing is audio. Another is representing interaction devices (pointers, motion controllers etc) to all users. Right now, each app just sends out the camera transform to the SharingServer which then distributes this to all other users. This will be extended to include PCM audio chunks and transforms for interaction devices so that everyone will be able to create a meaningful scene. Each user will receive the audio stream from every other user. The reason for this is that then each individual audio stream can be attached to the avatar for each user giving a spatialized sound effect using Unity capabilities (that’s the hope anyway). Another very important thing is that the apps work differently if they are running on VR type devices or AR/MR type devices. In the latter case, the walls and related objects are not drawn and just the colliders instantiated although virtual objects and avatars will be visible. Obviously AR/MR users want to see the real walls, light switches etc, not the virtual representations. However, they will still be able to interact in exactly the same way as a VR user.

Using blockchain technology to create verifiable sensor records and detect fakes

These days, machine learning techniques have led to the ability to create very realistic but fake video and audio that can be tough to distinguish from the real thing. The video above shows a very interesting example of this capability. The problem with this technology is that it will become impossible to determine if anything is genuine at all. What’s needed is some verification that a video of someone (for example) really is that person. Blockchain technology would seem to provide a solution for this.

Many years ago I was working on a digital watermarking-based system for detecting tampering in video records. Essentially, this embedded error-correcting codes in each frame that could be used to determine if any region of a frame had been modified after the digital watermark had been added. Cameras would add the digital watermark at source, limiting the opportunity for modification prior to watermarking.

One problem with this is that it worked on a frame by frame basis but didn’t ensure the integrity of an entire sequence. In theory this could be done with temporally distributed watermarks but blockchain technology provides a very nice alternative.

A simple strategy would be to have the sensor (camera, microphone, motion detector, whatever) create a hash for each unit of data (video frame, chunk of audio etc) and add this to a blockchain. Then a review app could create new hashes from the sensor data itself (stored elsewhere) and compare them to those in the blockchain. It could also determine that the account owner or device is who or what it is supposed to be in order to avoid spoofing. It’s easy to envisage an Etherium smart contract being the basis of such a system.

One issue with this is the potential rate at which hashes need to be added to the blockchain. This rate could be reduce by collecting more data (e.g. accumulating one second’s worth of data to generate one hash) or creating a hash of hashes at an appropriate rate. The only downside to this is losing temporal resolution of where changes have been made.

It’s worth considering the effects of lossy compression. Obviously if a stream is uncompressed or only uses lossless compression, watermarking and hash generation can be done at a very early stage. Watermarking of video is designed to withstand compression so that can still be done at a very early stage, even with lossy compression. The hash has to be be bit-accurate with the stream as stored on the video storage medium though so the hash must be computed after lossy compression.

It seems as though this blockchain concept could definitely be made to work and possibly combined with the digital watermarking technique in the case of video to provide temporal and spatial resolution of tampering. I am sure that variations of this concept are out there already or being developed and maybe, one day, it will be possible for anybody to check if a video of a well-known person is real or fake.

Using Windows Mixed Reality to visualize sentient spaces with rtXRView

The Windows Mixed Reality version of 3DView is now working nicely. Had a few problems with my Windows development PC which is a few years old and didn’t have adequate USB ports. In the end this PCI-e USB 3.1 card solved that problem otherwise a complete upgrade might have been required. A different USB 3.0 card did not work however.

Hopefully this is the last time that I see the displays all lined up like that. The space modeling software is coming along and soon it will be possible to model a space with a (relatively) simple procedural definition file. Potentially this could be texture mapped from a 3D scan of rooms but the simplified models generated procedurally with simple textures might well be good enough. Then it will be possible to position versions of these displays (and lots of other things) in the correct rooms.

XRView is intended to be runnable both on Windows MR headsets (I am using the Samsung Odyssey as it has a good display and built-in audio) and HoloLens. Now clearly VR modes and AR modes have to be completely different. In VR, you navigate and interact with the motion controllers and see the modeled space whereas in AR you navigate by walking around, interact using the clicker and don’t see the modeled space directly. However, the modeled space will still be there and will be used instead of the spatially mapped surfaces that the HoloLens might normally use. This means that objects placed in the model by a VR user will appear to AR users correctly positioned and vice versa. One key advantage of using the modeled space rather than the dynamically mapped space generated by the HoloLens itself is that it is easy to add context to the surfaces using the procedural model language. Another is the ability to interwork with non-HoloLens AR headsets that can share the HoloLens spatial map data. The procedural model becomes a platform-independent spatial mapping that “just” leaves the problem of spatial synchronization to the individual headsets.

I am sure that there will be some fun challenges in getting spatial synchronization but that’s something for later.

The ZeroSensor – a sentient space point of presence

One application for rt-ai Edge is ubiquitous sensing leading to sentient spaces – spaces that can interact with people moving through and provide useful functionality, whether learned or programmed. A step on the road to that is the ZeroSensor, four prototypes of which are shown in the photo. Each ZeroSensor consists of a Raspberry Pi Zero W, a Pi camera module v2, an Adafruit BME 680 breakout and an Adafruit TSL2561 breakout. The combination gives a video stream and a sensor stream with light, temperature, pressure, humidity and air quality values. The video stream can be used to derive motion sensing and identification while the other sensors provide a general idea of conditions in the space. Notably missing is audio. Microphone support would be useful for general sensing and I might add that in real devices. A 3D printable case design is underway in order to allow wide-scale deployment.

Voice-based interaction is a powerful way for users to interact with sentient spaces. However, it is assumed that people who want to interact are using an AR headset of some sort which itself provides the audio I/O capabilities. Gesture input would be possible via the ZeroSensor’s camera. For privacy reasons video would not be viewed directly or stored but just used as a source of activity data and interaction.

This is the simple rt-ai design used to test the ZeroSensors. The ZeroSynth modules are rt-ai Edge synth modules that contain SPEs that interface with the ZeroSensor’s hardware and generate a video stream and a sensor data stream. An instance of a video viewer and sensor viewer are connected to each ZeroSynth module.

This is the result of running the ZeroSensor test design, showing a video and sensor window for each ZeroSensor. The cameras are staring at the ceiling because the four sensors were on a table. When the correct case is available, they will be deployed in the corners of rooms in the space.

How rt-ai Edge will enable Sentient Spaces

The idea of creating spaces that understand the needs of the people moving within them – Sentient Spaces – has been a long term personal goal. Our ability today to create sensor data (video, audio, environmental etc) is incredible. Our ability to make practical use of this enormous body of data is minimal. The question is: how can ubiquitous sensing in a space be harnessed to make the space more functional for people within it?

rt-ai Edge could be the basis of an answer to this question. It is designed to receive large volumes of multi-sensor data, extract meaningful information and then take control actions as necessary. This closes the local loop without requiring external cloud server interaction. This is important because creating a space with ubiquitous sensing raises all kinds of privacy issues. Because rt-ai Edge keeps all raw data (such as video and audio) within the space, privacy is much less of a concern.

I believe that a key to making a space sentient is to harness artificial intelligence concepts such as online learning of event sequences and anomaly detection. It is not practical for anyone to sit down and program a system to correctly recognize normal behavior in a space and what actions might be helpful as a result. Instead, the system needs to learn what is normal and develop strategies that might be helpful. Reinforcement via user feedback can be used to refine responses.

A trivial example would be someone moving through a dark space at night. It might be helpful to provide light at a suitable intensity to safely help a person navigate the space. The system could deduce this by having experienced other people moving though the space, turning on and off lights as they go. Meanwhile, face recognition could be employed to see if the person is known to the space and, if not, an assessment could be made if an alert needs to be generated. Finally, a video record could be put together of the person moving through the space, using assembled clips from all relevant cameras, and stored (on-site) for a time in case it is useful.

Well that’s a trivial example to describe but not at all trivial to implement. However, my goal is to see if AI techniques can be used to approach this level of functionality. In practical terms, this means developing a series of rt-ai modules using TensorFlow to perform feature extraction, anomaly detection and sequence prediction that are then glued together with sensor and control modules to perform a complete system requiring minimal supervised training to perform useful functions.

Raspberry Pi Sense HAT and other sensors added to rtndf so that it’s a bit more IoT-like

sensorviewrtndf now has Python PPEs that support streaming data from a variety of environmental sensors. The sensehat PPE streams data from all of the sensors on the Raspberry Pi Sense HAT. The sensors PPE streams data from a variety of common environmental sensors:

  • ADX345 accelerometer
  • BMP180 pressure/temperature sensor
  • HTU21D humidity sensor
  • MCP9808 temperature sensor
  • TMP102 temperature sensor
  • TSL2561 light sensor

The specific sensors in use can be enabled by selectively commenting out lines in the sensors Python script.

sensorview is another new PPE that can display the sensor streams generated by sensehat and sensors. The screenshot shows the data from a sensehat for example.

imu and imuview – adding IMU sensing to rtndf data flow pipelines

imuviewUp to now the only data sources in rtndf were video and audio. imu is a new Python PPE that can be used to stream IMU data (fused pose, sensor readings etc) into an rtndf data flow pipeline. Another new PPE is imuview, this time a C++ PPE, that can display the resulting stream. The screen capture above shows the data being streamed from a Raspberry Pi SenseHat which is a full 11-dof sensor.

One of the nice things about using a pub/sub system like MQTT is that it is possible to hook into any of the pipeline links to see what data is flowing. To this end, a future PPE will be a generic viewer. The user just gives it the topic and it determines the type of data and displays it appropriately. A very handy debugging tool!