Registering virtual and real worlds with rt-xr, ARKit and Unity


One of the goals for rt-xr is to allow augmented reality users within a space to collaborate with virtual reality users physically outside of the space, with the VR users getting a telepresent sense of being physically within the same space. To this end, VR users see a complete model of the space (my office in this case) including augmentations while physically present AR users just see the augmentations. Some examples of augmentations are virtual whiteboards and virtual sticky notes. Both AR and VR users see avatars representing the position and pose of other users in the space.

Achieving this for AR users requires that their coordinate system corresponds with that of the virtual models of the room. For iOS, ARKit goes a long way to achieving this so the rt-xr app for iOS has been extended to include ARKit and work in AR mode. The screen capture above shows how coordinate systems are synced. A known location in physical space (in this case, the center of the circular control of the fan controller) is selected by touching the iPad screen on the exact center of the control. This identifies position. To avoid multiple control points, the app is currently started in the correct pose so that the yaw rotation is zero relative to the model origin. It is pretty quick and easy to do. The video below shows the process and the result.

After starting the app in the correct orientation, the user is then free to move to click on the control point. Once that’s done, the rt-xr part of the app starts up and loads the virtual model of the room. For this test, the complete model is being shown (i.e. as for VR users rather than AR users) although in real life only the augmentations would be visible – the idea here was to see how the windows lined up. The results are not too bad all things considered although moving or rotating too fast can cause some drift. However, collaborating using augmentations can tolerate some offset so this should not be a major problem.

There are just a couple of augmentations in this test. One is the menu switch (the glowing M) which is used to instantiate and control augmentations. There is also a video screen showing the snowy scene from the driveway camera, the feed being generated by an rt-ai design.

Next step is to test out VR and AR collaboration properly by displaying the correct AR scene on the iOS app. Since VR collaboration has worked for some time, extending it to AR users should not be too hard.

Scaling dynamic sentient spaces to multiple locations

One of the fundamental concepts of the rt-xr and rt-ai Edge projects is that it should be possible to experience a remote sentient space in a telepresent way. The diagram above shows the idea. The main sentient space houses a ManifoldNexus instance that supplies service discovery, subscription and message passing functions to all of the other components. Not shown is the rt-ai Edge component that deals with real-time intelligent processing, both reactive and proactive, of real-world sensor data and controls. However, rt-ai Edge interconnects with ManifoldNexus, making data and control flows available in the Manifold world.

Co-located with ManifoldNexus are the various servers that implement the visualization part of the sentient space. The SpaceServer allows occupants of the space to download a space definition file that is used to construct a model of the space. For VR users, this is a virtual model of the space that can be used remotely. For AR and MR users, only augmentations and interaction elements are instantiated so that the real space can be seen normally. The SpaceServer also houses downloadable asset bundles that contain augmentations that occupants have placed around the space. This is why it is referred to as a dynamic sentient space – as an occupant either physically or virtually enters the space, the relevant space model and augmentations are downloaded. Any changes that occupants make get merged back to the space definition and model repository to ensure that all occupants are synced with the space correctly. The SharingServer provides real-time transfer of pose and audio data. The Home Automation server provides a way for the space model to be linked with networked controls that physically exist in the space.

When everything is on a single LAN, things just work. New occupants of a space auto-discover sentient spaces available on that LAN and, via a GUI on the generic viewer app, can select the appropriate space. Normally there would be just one space but the system allows for multiple spaces on a single LAN if required. The issue then is how to connect VR users at remote locations. As shown in the diagram, ManifoldNexus has to ability to use secure tunnels between regions. This does require that one of the gateway routers has a port forwarding entry configured but otherwise requires no configuration other than security. There can be several remote spaces if necessary and a tunnel can support more than one sentient space. Once the Manifold infrastructure is established, integration is total in that auto-discovery and message switching all behave for remote occupants in exactly the same way as local occupants. What is also nice is that multicast services can be replicated for remote users in the remote LAN so data never has to be sent more than once on the tunnel itself. This optimization is implemented automatically within ManifoldNexus.

Dynamic sentient spaces (where a standard viewer is customized for each space by the servers) is now basically working on the five platforms (Windows desktop, macOS, Windows Mixed Reality, Android and iOS). Persistent ad-hoc augmentations using downloadable assets is the next step in this process. Probably I am going to start with the virtual sticky note – this is where an occupant can leave a persistent message for other occupants. This requires a lot of the general functionality of persistent dynamic augmentations and is actually kind of useful for change!

Latest fun thing in the office: a Garmin VIRB 360 camera

360 degree video is all the rage right now so I cannot be left behind! One of the things I like about the Garmin VIRB 360 is the in-camera stitching and very high resolution. It is also incredibly small. Judging by my photo though keeping the dust off will be a challenge :-).

Typically I forgot to order a micro-HDMI cable so I can’t test live capture to a PC but I can create videos on the SD card. Great fun!

Cable will turn up tomorrow with any luck. I am eager to see how usable the HDMI is for live 360 videos.

Telepresent Enhanced Reality (TER)

Following on from an earlier post on Enhanced Reality, it occurred to me that separating the stereo cameras (and microphones) from the ER headset creates a new way of achieving telepresent remote participation – Telepresent Enhanced Reality or TER. I was actually trying out a simpler version a while back when I had a camera on a pan/tilt platform slaved to an Oculus DK2 VR headset. A real TER setup would require stereo cameras and multiple microphones on a pan/tilt/roll mount. The user would have a VR headset and the pose of the pan/tilt/roll mount would mirror movements of the user’s head.

An interesting use would be for conferences where some of the participants are in a conventional conference room but wearing AR/MR/ER headsets (eg HoloLens). Positions in the room for the remote participants would each have a stereo camera/microphone remote. The local participants would obviously be able to see each other but instead of the camera/microphone hardware, they would see avatars representing the remote users. These avatars could be as sophisticated or as simple as desired. Remote participants would see (via the stereo cameras) the conference room and local participants and would also see the remote participant avatars which replace the physical camera/microphone hardware at those locations. Alternatively, these could be suitably equipped telepresence robots (or even cameras mounted on small drones) which would also allow movement around the room. Really anything that has the essential hardware (stereo cameras, microphones, pan/tilt/roll capability) could be used.

Given that everyone has AR/MR capability in this setup, something like a conventional projected presentation could still be done except that the whole thing would be virtual – a virtual screen would be placed on a suitable wall and everyone could look at it. Interaction could be with simulated laser pointers and the like. Equally, every position could have its own simulated monitor that displays the presentation. Virtual objects visible to everyone could be placed on the table (or somewhere in the room) for discussion, annotation or modification.

Obviously everyone could be remote and use a VR headset and everything could then be virtual with no need for hardware. However, the scheme described preserves some of the advantages of real meetings while at the same time allowing remote participants to feel like they are really there too.