rt-xr visualization with spatialized sound

An important goal of the rt-xr project is to allow MR and AR headset wearing physical occupants of a sentient space to interact as naturally as possible with virtual users in the same space. A component of this is spatialized sound, where a sound or someone’s voice appears to originate from where it should in the scene. Unity has a variety of tools for achieving this, depending on the platform.

I have standardized on 16 bit, single channel PCM at 16000 samples per second for audio within rt-xr in order to keep implementation simple (no need for codecs) but still keep the required bit rate down. The problem is that the SharingServer has to send all audio feeds to all users – each user needs all the other user’s feeds so that they can spatialize them correctly. If spatialized sound wasn’t required, the SharingServer could just mix them all together on some basis. Another solution is for the SharingServer to just forward the dominant speaker but this assumes that only intermittent speakers are supported. Plus it leads to the “half-duplex” effect where the loudest speaker blocks everyone else. Mixing them all is a lot more democratic.

Another question is how to deal with occupants in different rooms within the same sentient space. Some things (such as video) are turned off to reduce bit rate if the user isn’t in the same room as the video panel. However, it makes sense that you can hear users in other rooms at an appropriate level. The AudioSource in Unity has tools for ensuring that sound levels drop off appropriately.

Spatialized sound currently works on Windows desktop and Windows MR. The desktop version uses the Oculus spatializer as this can support 16000 samples per second. The Windows MR version uses the Microsoft HRTF spatializer which unfortunately requires 48000 samples per second so I have to upsample to do this. This does mess up the quality a bit – better upsampling is a todo.

Right now, the SharingServer just broadcasts a standard feed with all audio sources. Individual users filter these in two ways. First of all, they discard their own audio feed. Secondly, if the user is a physical occupant of the space, feeds from other physical occupants are omitted so as to just leave the VR user feeds. Whether or not it would be better to send customized feeds to each user is an interesting question – this could certainly be done if necessary. For example, a simple optimization would be to have two feeds – one for AR and MR users that only contains VR user audio and the current complete feed for VR users. This has the great benefit of cutting down bit rate to AR and MR users whose headsets may benefit from not having to deal with unnecessary data. In fact, this idea sounds so good that I think I am going to implement it!

Next up is getting something to work on Android. I am using native audio capture code on the two Windows platforms and something is needed for Android. There is a Unity technique using the Microphone that, coupled with a custom audio filter, might work. If not, I might have to brush up on JNI. Probably spatialized sound is going to be difficult in terms of panning. Volume rolloff with distance should work however.

Sentient space sharing avatars with Windows desktop, Windows Mixed Reality and Android apps


One of the goals of the rt-ai Edge system is that users of the system can use whatever device they have available to interact and extract value from it. Unity is a tremendous help given that Unity apps can be run on pretty much everything. The main task was integration with Manifold so that all apps can receive and interact with everything else in the system. Manifold currently supports Windows, UWP, Linux, Android and macOS. iOS is a notable absentee and will hopefully be added at some point in the future. However, I perceive Android support as more significant as it also leads to multiple MR headset support.

The screen shot above and video below show three instances of the rt-ai viewer apps running on Windows desktop, Windows Mixed Reality and Android interacting in a shared sentient space. Ok, so the avatars are rubbish (I call them Sad Robots) but that’s just a detail and can be improved later. The wall panels are receiving sensor and video data from ZeroSensors via an rt-ai Edge stream processing network while the light switch is operated via a home automation server and Insteon.

Sharing is mediated by a SharingServer that is part of Manifold. The SharingServer uses Manifold multicast and end to end services to implement scalable sharing while minimizing the load on each individual device. Ultimately, the SharingServer will also download the space definition file when the user enters a sentient space and also provide details of virtual objects that may have been placed in the space by other users. This allows a new user with a standard app to enter a space and quickly create a view of the sentient space consistent with existing users.

While this is all kind of fun, the more interesting thing is when this is combined with a HoloLens or similar MR headset. The MR headset user in a space would see any VR users in the space represented by their avatars. Likewise, VR users in a space would see avatars representing MR users in the space. The idea is to get as close to a telepresent experience for VR users as possible without very complex setups. It would be much nicer to use Holoportation but that would require every room in the space has a very complex and expensive setup which really isn’t the point. The idea is to make it very easy and low cost to implement an rt-ai Edge based sentient space.

Still lots to do of course. One big thing is audio. Another is representing interaction devices (pointers, motion controllers etc) to all users. Right now, each app just sends out the camera transform to the SharingServer which then distributes this to all other users. This will be extended to include PCM audio chunks and transforms for interaction devices so that everyone will be able to create a meaningful scene. Each user will receive the audio stream from every other user. The reason for this is that then each individual audio stream can be attached to the avatar for each user giving a spatialized sound effect using Unity capabilities (that’s the hope anyway). Another very important thing is that the apps work differently if they are running on VR type devices or AR/MR type devices. In the latter case, the walls and related objects are not drawn and just the colliders instantiated although virtual objects and avatars will be visible. Obviously AR/MR users want to see the real walls, light switches etc, not the virtual representations. However, they will still be able to interact in exactly the same way as a VR user.

Controlling the real world using Windows Mixed Reality, Manifold, rt-ai Edge and Insteon

Having now constructed a simple walk around model of my office and another room, it was time to start work on the interaction side of things. I have an Insteon switch controlling some of the lights in my office and this seemed like an obvious target. Manifold now has a home automation server app (HAServer) based on one from an earlier project. This allows individual Insteon devices to be addressed by user-friendly names using JSON over Manifold’s end to end datagram service. Light switches can now be specified in the Unity rtXRView space definition file and linked to the control interface of the HAServer.

The screen capture above and video below were made using a Samsung Odyssey headset and motion controllers. The light switch specification causes a virtual light switch to be placed, ideally exactly where the real light switch happens to be. Then, by pointing at the light switch with the motion controller and clicking, the light can be turned on and off. The virtual light switch is gray when the light is off and green when it is on. If the real switch is operated by some other means, the virtual light switch will reflect this as the HAServer broadcasts state change updates on a regular basis. It’s nice to see that the light sensor on the ZeroSensor responds appropriately to the light level too. Technically this light switch is a dimmer – setting an intermediate level is a TODO at this point.

An interesting aspect of this is the extent to which a remote VR user can get a sense of telepresence in a space, even if it is just a virtual representation of the real space. To make that connection more concrete, the virtual light in Unity should reflect the ambient light level as measured by the ZeroSensor. That’s another TODO…

While this is kind of fun in the VR world, it could actually be interesting in the AR world. If the virtual light switch is placed correctly but is invisible (apart from a collider), a HoloLens user (for example) could look at a real light switch and click in order to change the state of the switch. Very handy for the terminally lazy! More useful than just this would be to annotate the switch with what it controls. For some reason, people in this house never seem to know which light switch controls what so this feature by itself would be quite handy.

A virtual walk through a sentient space with rt3DView

The screen capture above and video below are from a walk-through of a procedurally generated sentient space model with video and IoT data displays (derived from ZeroSensor data, rt-ai Edge and Manifold). This was made using rt3DView and the actual Unity video recording made with the aid of this very nice Unity store asset.

The idea of this model is that it reflects the major features of the real sentient space so that users of VR and AR can interact correctly. For example, an AR headset wearer in one of the rooms would also see the displays on the equivalent physical wall. This model is pretty basic but obviously a lot more bling could be added to get further along the road to realism. Plus I made no attempt to sort out the exterior for this test.

Now that the basics are working and the XR world is fully coupled to the rt-ai Edge design that is the real world element of the sentient space, the focus will move to more interaction. Instantiating new objects, positioning objects, real-time sharing of camera poses leading to avatars… The list is endless.

Continue reading “A virtual walk through a sentient space with rt3DView”

Using Windows Mixed Reality to visualize sentient spaces with rtXRView

The Windows Mixed Reality version of 3DView is now working nicely. Had a few problems with my Windows development PC which is a few years old and didn’t have adequate USB ports. In the end this PCI-e USB 3.1 card solved that problem otherwise a complete upgrade might have been required. A different USB 3.0 card did not work however.

Hopefully this is the last time that I see the displays all lined up like that. The space modeling software is coming along and soon it will be possible to model a space with a (relatively) simple procedural definition file. Potentially this could be texture mapped from a 3D scan of rooms but the simplified models generated procedurally with simple textures might well be good enough. Then it will be possible to position versions of these displays (and lots of other things) in the correct rooms.

XRView is intended to be runnable both on Windows MR headsets (I am using the Samsung Odyssey as it has a good display and built-in audio) and HoloLens. Now clearly VR modes and AR modes have to be completely different. In VR, you navigate and interact with the motion controllers and see the modeled space whereas in AR you navigate by walking around, interact using the clicker and don’t see the modeled space directly. However, the modeled space will still be there and will be used instead of the spatially mapped surfaces that the HoloLens might normally use. This means that objects placed in the model by a VR user will appear to AR users correctly positioned and vice versa. One key advantage of using the modeled space rather than the dynamically mapped space generated by the HoloLens itself is that it is easy to add context to the surfaces using the procedural model language. Another is the ability to interwork with non-HoloLens AR headsets that can share the HoloLens spatial map data. The procedural model becomes a platform-independent spatial mapping that “just” leaves the problem of spatial synchronization to the individual headsets.

I am sure that there will be some fun challenges in getting spatial synchronization but that’s something for later.

Using Unity and Manifold with Android devices to visualize sentient spaces

This may not look impressive to you (or my wife as it turns out) but it has a lot of promise for the future. Following on from 3DView, there’s now an Android version called (shockingly) AndroidView that is essentially the same thing running on an Android phone in this case. The screen capture above shows the current basic setup displaying sensor data. Since Unity is basically portable across platforms, the main challenge was integrating with Manifold to get the sensor data being generated by ZeroSensors in an rt-aiEdge stream processing network.

I did actually have a Java implementation of a Manifold client from previous work – the challenge was integrating with Unity. This meant building the client into an aar file and then using Unity’s AndroidJavaObject to pass data across the interface. Now I understand how that works, it really is quite powerful and I was able to do everything needed for this application.

There are going to be more versions of the viewer. For example, in the works is rtXRView which is designed to run on Windows MR headsets. The way I like to structure this is to have separate Unity projects for each target and then move common stuff via Unity’s package system. With a bit of discipline, this works quite well. The individual projects can then have any special libraries (such as MixedRealityToolkit), special cameras, input processing etc without getting too cute.

Once the basic platform work is done, it’s back to sorting out modeling of the sentient space and positioning of virtual objects within that space. Multi-user collaboration and persistent sentient space configuration is going to require a new Manifold app to be called SpaceServer. Manifold is ideal for coordinating real-time changes using its natural multicast capability. For Unity reasons, I may integrate a webserver into SpaceServer so that assets can be dynamically loaded using standard Unity functions. This supports the idea that a new user walking into a sentient space is able to download all necessary assets and configurations using a standard app. Still, that’s all a bit in the future.

Google’s WorldSense

Lenovo just announced the Mirage Solo VR headset with Google’s WorldSense inside-out tracking capability. The result is an untethered VR headset which presumably has spatial mapping capabilities, allowing spatial maps to be saved and shared. If so, this would be a massive advance over ARKit and ARCore based AR which makes persistence and collaboration all but impossible (the post here goes into a lot of detail about the various issues related to persistence and collaboration with current technology). The lack of a tether also gives it an edge over Microsoft’s (so-called) Mixed Reality headsets.

Google’s previous Tango system (that’s a Lenovo Phab 2 Pro running it above) did have much more interesting capabilities than ARCore but has fallen by the wayside. In particular, Tango had an area learning capability that is missing from ARCore. I am very much hoping that something like this will exist in WorldSense so that virtual objects can be placed persistently in spaces and that spatial maps can be shared so that multiple headsets see exactly the same virtual objects in exactly the same place in the real space. Of course this isn’t all that helpful when used with a VR headset – but maybe someone will manage a pass-through or see-through mixed reality headset using WorldSense that will enable persistent spatial augmentation using a headset with hopefully reasonable cost for ubiquitous use. If it was also able to perform real time occlusion (where virtual objects can get occluded by real objects), that would be even better!

An interesting complement to this is the Lenovo Mirage stereo camera. This is capable of taking 180 degree videos and stills suitable for use with stereoscopic 3D displays, such as the Mirage headset. Suddenly occurred to me that this might be a way of hacking a  pass-through AR capability for Mirage before someone does it for real :-). This is kind of what Stereolabs are doing for existing VR headsets with their ZED mini except that this is a tethered solution. The nice thing would be to do this in an untethered way.