Scaling dynamic sentient spaces to multiple locations

One of the fundamental concepts of the rt-xr and rt-ai Edge projects is that it should be possible to experience a remote sentient space in a telepresent way. The diagram above shows the idea. The main sentient space houses a ManifoldNexus instance that supplies service discovery, subscription and message passing functions to all of the other components. Not shown is the rt-ai Edge component that deals with real-time intelligent processing, both reactive and proactive, of real-world sensor data and controls. However, rt-ai Edge interconnects with ManifoldNexus, making data and control flows available in the Manifold world.

Co-located with ManifoldNexus are the various servers that implement the visualization part of the sentient space. The SpaceServer allows occupants of the space to download a space definition file that is used to construct a model of the space. For VR users, this is a virtual model of the space that can be used remotely. For AR and MR users, only augmentations and interaction elements are instantiated so that the real space can be seen normally. The SpaceServer also houses downloadable asset bundles that contain augmentations that occupants have placed around the space. This is why it is referred to as a dynamic sentient space – as an occupant either physically or virtually enters the space, the relevant space model and augmentations are downloaded. Any changes that occupants make get merged back to the space definition and model repository to ensure that all occupants are synced with the space correctly. The SharingServer provides real-time transfer of pose and audio data. The Home Automation server provides a way for the space model to be linked with networked controls that physically exist in the space.

When everything is on a single LAN, things just work. New occupants of a space auto-discover sentient spaces available on that LAN and, via a GUI on the generic viewer app, can select the appropriate space. Normally there would be just one space but the system allows for multiple spaces on a single LAN if required. The issue then is how to connect VR users at remote locations. As shown in the diagram, ManifoldNexus has to ability to use secure tunnels between regions. This does require that one of the gateway routers has a port forwarding entry configured but otherwise requires no configuration other than security. There can be several remote spaces if necessary and a tunnel can support more than one sentient space. Once the Manifold infrastructure is established, integration is total in that auto-discovery and message switching all behave for remote occupants in exactly the same way as local occupants. What is also nice is that multicast services can be replicated for remote users in the remote LAN so data never has to be sent more than once on the tunnel itself. This optimization is implemented automatically within ManifoldNexus.

Dynamic sentient spaces (where a standard viewer is customized for each space by the servers) is now basically working on the five platforms (Windows desktop, macOS, Windows Mixed Reality, Android and iOS). Persistent ad-hoc augmentations using downloadable assets is the next step in this process. Probably I am going to start with the virtual sticky note – this is where an occupant can leave a persistent message for other occupants. This requires a lot of the general functionality of persistent dynamic augmentations and is actually kind of useful for change!

rt-xr visualization with spatialized sound

An important goal of the rt-xr project is to allow MR and AR headset wearing physical occupants of a sentient space to interact as naturally as possible with virtual users in the same space. A component of this is spatialized sound, where a sound or someone’s voice appears to originate from where it should in the scene. Unity has a variety of tools for achieving this, depending on the platform.

I have standardized on 16 bit, single channel PCM at 16000 samples per second for audio within rt-xr in order to keep implementation simple (no need for codecs) but still keep the required bit rate down. The problem is that the SharingServer has to send all audio feeds to all users – each user needs all the other user’s feeds so that they can spatialize them correctly. If spatialized sound wasn’t required, the SharingServer could just mix them all together on some basis. Another solution is for the SharingServer to just forward the dominant speaker but this assumes that only intermittent speakers are supported. Plus it leads to the “half-duplex” effect where the loudest speaker blocks everyone else. Mixing them all is a lot more democratic.

Another question is how to deal with occupants in different rooms within the same sentient space. Some things (such as video) are turned off to reduce bit rate if the user isn’t in the same room as the video panel. However, it makes sense that you can hear users in other rooms at an appropriate level. The AudioSource in Unity has tools for ensuring that sound levels drop off appropriately.

Spatialized sound currently works on Windows desktop and Windows MR. The desktop version uses the Oculus spatializer as this can support 16000 samples per second. The Windows MR version uses the Microsoft HRTF spatializer which unfortunately requires 48000 samples per second so I have to upsample to do this. This does mess up the quality a bit – better upsampling is a todo.

Right now, the SharingServer just broadcasts a standard feed with all audio sources. Individual users filter these in two ways. First of all, they discard their own audio feed. Secondly, if the user is a physical occupant of the space, feeds from other physical occupants are omitted so as to just leave the VR user feeds. Whether or not it would be better to send customized feeds to each user is an interesting question – this could certainly be done if necessary. For example, a simple optimization would be to have two feeds – one for AR and MR users that only contains VR user audio and the current complete feed for VR users. This has the great benefit of cutting down bit rate to AR and MR users whose headsets may benefit from not having to deal with unnecessary data. In fact, this idea sounds so good that I think I am going to implement it!

Next up is getting something to work on Android. I am using native audio capture code on the two Windows platforms and something is needed for Android. There is a Unity technique using the Microphone that, coupled with a custom audio filter, might work. If not, I might have to brush up on JNI. Probably spatialized sound is going to be difficult in terms of panning. Volume rolloff with distance should work however.

3DView: visualizing environmental data for sentient spaces

Th 3DView app I mentioned in a previous post is moving forward nicely. The screen capture shows the app capturing real time from four ZeroSensors, with the real time data coming from an rt-ai Edge stream processing network via Manifold. The app creates a video window and sensor display panel for each physical device and then updates the data whenever new messages are received from the ZeroSensor.

This is the rt-ai Edge part of the design. All the blocks are synth modules to speed the design replication. The four ZeroManifoldSynth modules each contain two PutManifold stream processing elements (SPEs) to inject the video and sensor streams into the Manifold. The ZeroSynth modules contain the video and sensor capture SPEs. The ZeroManifoldSynth modules all run on the default node while the ZeroSynth modules run directly on the ZeroSensors themselves. As always with rt-ai Edge, deployment of new designs or design changes is a one click action making this kind of distributed system development much more pleasant.

The Unity graphics elements are basic as I take the standard programmer’s view of Unity graphics elements: they can always be upgraded later by somebody with artistic talent but the key is the underlying functionality. The next step moving forward is to hang these displays (and other much more interesting elements) on the walls of a 3D model of the sentient space. Ultimately the idea is that people can walk through the sentient space using AR headsets and see the elements persistently positioned in the sentient space. In addition, users of the sentient space will be able to instantiate and position elements themselves and also interact with them.

Even more interesting than that is the ability for the sentient space to autonomously instantiate elements in the space based on perceived user actions. This is really the goal of the sentient space concept – to have the sentient space work with the occupants in a natural way (apart from needing an AR headset of course!).

For the moment, I am going to develop this in VR rather than AR. The HoloLens is the only available AR device that can support the level of persistence required but I’d rather wait for the rumored HoloLens 2 or the Magic Leap One (assuming it has the required multi-room persistence capability).

On the road to sentient spaces: using Unity to visualize rt-ai Edge streams via Manifold

It’s only a step on the road to the ultimate goal of AR headset support for sentient spaces but it is a start at least. As mentioned in an earlier post, passing data from rt-ai Edge into Manifold allows any number of ad-hoc uses of real time and historic data. One of the intended uses is to support a number of AR headset-wearing occupants in a sentient space – the rt-ai Edge to Manifold connection makes this relatively straightforward. Almost every AR headset supports Unity so it seemed like a natural step to develop a Manifold connection for Unity apps. The result, an app called 3DView, is shown in the screen capture above. The simple scene consists of a couple of video walls displaying MJPEG video feeds captured from the rt-ai Edge network.

The test design (shown above) to generate the data is trivial but demonstrates that any rt-ai Edge stream can be piped out into the Manifold allowing access by appropriate Manifold apps. Although not yet fully implemented, Manifold apps will be able to feed data back into the rt-ai Edge design via a new SPE to be called GetManifold.

Next step for the 3DView Unity app is to provide visualization for all ZeroSensor streams correctly physically located within a 3D model of the sentient space. Right now I am using a SpaceMouse to navigate within the space but ultimately this should work with any VR headset with appropriate controller. AR headsets will use their spatial mapping capability to overlay visualizations on the real space so they won’t need a separate controller for navigation.

Google’s WorldSense

Lenovo just announced the Mirage Solo VR headset with Google’s WorldSense inside-out tracking capability. The result is an untethered VR headset which presumably has spatial mapping capabilities, allowing spatial maps to be saved and shared. If so, this would be a massive advance over ARKit and ARCore based AR which makes persistence and collaboration all but impossible (the post here goes into a lot of detail about the various issues related to persistence and collaboration with current technology). The lack of a tether also gives it an edge over Microsoft’s (so-called) Mixed Reality headsets.

Google’s previous Tango system (that’s a Lenovo Phab 2 Pro running it above) did have much more interesting capabilities than ARCore but has fallen by the wayside. In particular, Tango had an area learning capability that is missing from ARCore. I am very much hoping that something like this will exist in WorldSense so that virtual objects can be placed persistently in spaces and that spatial maps can be shared so that multiple headsets see exactly the same virtual objects in exactly the same place in the real space. Of course this isn’t all that helpful when used with a VR headset – but maybe someone will manage a pass-through or see-through mixed reality headset using WorldSense that will enable persistent spatial augmentation using a headset with hopefully reasonable cost for ubiquitous use. If it was also able to perform real time occlusion (where virtual objects can get occluded by real objects), that would be even better!

An interesting complement to this is the Lenovo Mirage stereo camera. This is capable of taking 180 degree videos and stills suitable for use with stereoscopic 3D displays, such as the Mirage headset. Suddenly occurred to me that this might be a way of hacking a  pass-through AR capability for Mirage before someone does it for real :-). This is kind of what Stereolabs are doing for existing VR headsets with their ZED mini except that this is a tethered solution. The nice thing would be to do this in an untethered way.

The disaggregated smartphone and the road to ubiquitous AR

Nearly five years ago I posted this entry on a blog I was running at the time:

Breaking Apart The Smartphone

…it’s not difficult to imagine a time when the smartphone is split into three pieces – the processor and cellular interface, the display device and the input device. One might imagine that the current smartphone suppliers will end up producing the heart of the system – the display-less main CPU, the cellular interface, Bluetooth, WiFi, NFC etc. However, it will open up new opportunities for suppliers of display and input devices. It’s pretty safe to assume that Google Glass won’t be the only show in town and users will be able to pick and choose between different display devices – and possibly different display devices for different applications. Likewise input devices will vary depending on the environment and style of use.

Maybe we’ll look back on the current generation of smartphones as being inhibited by their touchscreens rather than enabled by them…

I was only thinking vaguely of AR at the time but now it seems even more relevant. A key enabling technology is a low power wireless connection between the processing core and the display. With this implemented in the smartphone core, things change tremendously.

For example, I have a smartphone that is pocketable in size, an iPad for things where a bigger screen is required, a smartwatch for when I am too lazy to get the phone out of my pocket etc. I only have a SIM for the smartphone because even having one cellular contract is annoying, let alone one per device. How would having a wireless display capability change this?

For a start, I would only have one smartphone core for everything. This would have the one and only SIM card. When I want a tablet type presentation, I could use a suitable size wireless display. This would be light, cheap and somewhat expendable, unlike the smartphone itself. However, in this concept, the smartphone can always be kept somewhere safe – expensive screen replacements would be a thing of the past, especially if the smartphone core doesn’t even have a screen. I like to ride a bike around and it would be nice to have easy access to the smartphone while doing so and in all weathers. You can get bike bags that you can put a smartphone in but they are pretty lame and actually quite annoying in general. Instead, I could have a cheap waterproof display mounted on the bike without any need for waterproof bags.

Since the display is remote, why not have a TV sized screen that connects in the same way? Everything streamable could be accessed by the smartphone and displayed on the TV without a need for any other random boxes.

Finally, AR. Right now AR headsets kind of suck in one way or another. I am intrigued by the idea that, one day, people will wear AR type devices most of the time and what that means for, well, everything. The only way this is going to happen in the near future is if the headset itself is kept as small and light as possible and just acts as a display and a set of sensors (inside out tracking, IMU, depth etc). Just like the other displays, it connects to a smartphone core via a wireless link (I believe that any sort of tethered AR headset is unacceptable in general). The smartphone core does all of the clever stuff including rendering and then the output of the GPU is sent up to the headset for display. An AR headset like this could be relatively cheap, waterproof, dustproof and potentially worn all day.

What does a world with ubiquitous AR actually look like? Who knows? But if people start to assume that everyone has AR headsets then “real world” augmentation (decoration, signage etc) will give way to much more flexible and powerful virtual augmentations – anyone not using an AR headset might see a very bland and uninformative world indeed. On the other hand, people using AR headsets might well see some sort of utopian version of reality that has been finely tuned to their tastes. It’s definitely Black Mirror-ish but not all technology has to have horrendous outcomes.