Registering virtual and real worlds with rt-xr, ARKit and Unity


One of the goals for rt-xr is to allow augmented reality users within a space to collaborate with virtual reality users physically outside of the space, with the VR users getting a telepresent sense of being physically within the same space. To this end, VR users see a complete model of the space (my office in this case) including augmentations while physically present AR users just see the augmentations. Some examples of augmentations are virtual whiteboards and virtual sticky notes. Both AR and VR users see avatars representing the position and pose of other users in the space.

Achieving this for AR users requires that their coordinate system corresponds with that of the virtual models of the room. For iOS, ARKit goes a long way to achieving this so the rt-xr app for iOS has been extended to include ARKit and work in AR mode. The screen capture above shows how coordinate systems are synced. A known location in physical space (in this case, the center of the circular control of the fan controller) is selected by touching the iPad screen on the exact center of the control. This identifies position. To avoid multiple control points, the app is currently started in the correct pose so that the yaw rotation is zero relative to the model origin. It is pretty quick and easy to do. The video below shows the process and the result.

After starting the app in the correct orientation, the user is then free to move to click on the control point. Once that’s done, the rt-xr part of the app starts up and loads the virtual model of the room. For this test, the complete model is being shown (i.e. as for VR users rather than AR users) although in real life only the augmentations would be visible – the idea here was to see how the windows lined up. The results are not too bad all things considered although moving or rotating too fast can cause some drift. However, collaborating using augmentations can tolerate some offset so this should not be a major problem.

There are just a couple of augmentations in this test. One is the menu switch (the glowing M) which is used to instantiate and control augmentations. There is also a video screen showing the snowy scene from the driveway camera, the feed being generated by an rt-ai design.

Next step is to test out VR and AR collaboration properly by displaying the correct AR scene on the iOS app. Since VR collaboration has worked for some time, extending it to AR users should not be too hard.

Integrating Core ML with Unity on iOS

The latest iPads and iPhones have some pretty serious edge neural network capabilities that are a natural fit with ARKit and Unity. AR and Unity go together quite nicely as AR provides an excellent way of communicating back to the user the results of intelligently processing sensor data from the user, other users and static (infrastructure) sensors in a space. The screen capture above was obtained from code largely based on this repo which integrates Core ML models with Unity. In this case, Inceptionv3 was used. While it isn’t perfect, it does ably demonstrate that this can be done. Getting the plugin to work was quite straightforward – you just have to include the mlmodel file in XCode via the Files -> Add Files menu option rather than dragging the file into the project. The development cycle is pretty annoying as the plugin won’t run in the Unity Editor and compile (on my old Mac Mini) is painfully slow but I guess a decent Mac would do a better job.

This all brings up the point that there seem to be different perceptions of what the edge actually is. rt-ai Edge can be perceived as a local aggregation and compute facility for inference-capable or conventional mobile and infrastructure devices (such as security cameras) – basically an edge compute facility supporting edge devices. A particular advantage of edge compute is that it is possible to integrate legacy devices (such as dumb cameras) into an AI-enhanced system by utilizing edge compute inference capabilities. In a sense, edge compute is a local mini-cloud, providing high capacity compute and inference a short distance in time away from sensors and actuators. This minimizes backhaul and latency, not to mention securing data in the local area rather than dispersing it in a cloud. It can also be very cost-effective when compared to the costs of running multiple cloud CPU instances 24/7.

Given the latest developments in tablets and smart phones, it is essential that rt-ai Edge be able to incorporate inference-capable devices into its stream processing networks. Inference-capable, per user devices make scaling very straightforward as capability increases in direct proportion to the number of users of an edge system. The normal rt-ai Edge deployment system can’t be used with mobile devices which requires (at the very least) framework apps to make use of AI models within the devices themselves. However, with that proviso, it is certainly possible to incorporate smart edge devices into edge networks with rt-ai Edge.

 

An rt-xr SpaceObjects tour de force

rt-xr SpaceObjects are now working very nicely. It’s easy to create, configure and delete SpaceObjects as needed using the menu switch which has been placed just above the light switch in my office model above.

The video below shows all of this in operation.

The typical process is to instantiate an object, place and size it and then attach it to a Manifold stream if it is a Proxy Object. Persistence, sharing and collaboration works for all relevant SpaceObjects across the supported platforms (Windows and macOS desktop, Windows MR, Android and iOS).

This is a good place to leave rt-xr for the moment while I wait for the arrival of some sort of AR headset in order to support local users of an rt-xr enhanced sentient space. Unfortunately, Magic Leap won’t deliver to my zip code (sigh) so that’s that for the moment. Lots of teasers about the HoloLens 2 right now and this might be the best way to go…eventually.

Now the focus moves back to rt-ai Edge. While this is working pretty well, it needs to have a few bugs fixed and also add some production modes (such as auto-starting SPNs when server nodes are started). Then begins the process of data collection for machine learning. ZeroSensors will collect data from each monitored room and this will be saved by ManifoldStore for later use. The idea is to classify normal and abnormal situations and also to be proactive in responding to the needs of occupants of the sentient space.

Latest rt-xr toy: shared virtual whiteboards

Since the sticky note idea now works, I thought that it would be fun to do a freehand version – a virtual whiteboard. It’s working pretty reasonably now. I placed a big whiteboard in my virtual office as you can see above to show how two or more occupants of the space can work together on a shared virtual whiteboard. The video below shows how this works.

The screen on the left is the desktop rt-xrViewer app, the screen on the left is the Mixed Reality Portal showing the Windows Mixed Reality rt-xrViewer app. The mouse is used to draw on the whiteboard in the desktop app (blue lines), motion controllers are used for the WMR app (red lines).

This also shows the new interaction rays. They sort of emanate from where the nose of the avatar should be.

They help give a sense of what the virtual occupants are doing. Otherwise, writing on the whiteboard seems a bit ghostly.

Whiteboards are actually proxy objects, driven from a special server that’s part of the SharingServer. The whiteboard itself is a completely dumb graphical asset. This makes it ideal for packaging as a Unity assetbundle and downloading at runtime rather than having to be built into the app. The required standard scripts included with rt-xrViewer are attached after a proxy object is instantiated.

This is the first time that proxy objects have supported interaction, opening the door to more interesting proxy objects in the future.

rt-xr SpaceObject sharing and persistence demo

SpaceObjects are dynamic objects that can be created, manipulated and deleted within the sentient space. The sticky note SpaceObject is the perfect vehicle for demonstrating these capabilities, as shown in the video below (which would have been even better if the camera had been exactly horizontal but, oh well). The monitor on the left is showing the rt-xrViewer app for Windows desktop, the one on the right is the Mixed Reality Portal showing the rt-xrViewer app for Windows Mixed Reality. I was wearing the WMR headset and using a motion controller to interact with the space. Right now you can create a sticky note, position it, add and edit the text and also delete it. In fact, any occupant of the space, physical or virtual, can edit the text if they want (obviously permissions for all of this is a TODO). Any number of sticky notes can be created and left around the space as a sort of virtual graffiti.

It’s a little tough to see but, as the text is being edited on the WMR app, the text is changing in real-time on the desktop app. Not totally necessary but kind of amusing to watch.

SpaceObject sharing is performed using the SpaceServer while the SharingServer provides the avatar pose sharing and audio sharing as before. Of course this all works on macOS, Android and iOS so that any reasonable device can participate. And of course AR and MR headset users can interact with SpaceObjects. The SpaceServer is able to make persistent all salient settings for each SpaceObject. All SpaceObjects persist position, rotation, and scale. In the case of the sticky note, it also includes the current text. Any occupant coming into an existing session will get the latest space state when they receive the space definition from the SpaceServer and from then on they will receive real-time updates of any changes.

These latest capabilities, coupled with the spatialized audio sharing, create a quite nice collaborative environment. Next up is the ability to download SpaceObjects on demand from object servers. Since SpaceObjects can also be proxy objects, this opens the door to all kinds of active bling to brighten up the space.

Multi-platform interaction styles for rt-xr and Unity

The implementation of sticky notes in rt-xr opened a whole can of worms but really just forced the development of a set of capabilities that will be needed for the general case where occupants of a sentient space can download assets from anywhere, instantiate them in a space and then interact with them. In particular, the need to be able to create a sticky note, position it and add text to it when being used on the supported platforms (Windows and mac OS desktop, Android, iOS and Windows Mixed Reality) required a surprising amount of work. I decided to standardize on a three button mouse model and map the various interaction styles into mouse events. This means that the bulk of the code doesn’t need to care about the interaction style (mouse, motion controller, touch screen etc) as all the complexity is housed in one script.

The short video below shows this in operation on a Windows desktop.

It ended up running a bit fast but that was due to the video recorder setup – I can’t really do things that fast!

I am still just using opaque devices – where is my HoloLens 2 or Magic Leap?!!! However, things should map across pretty well. Note how the current objects are glued to the virtual walls. Using MR devices, the spatial maps would be used for raycasting so that the objects would be glued to the real walls. I do need to add a mode where you can pull things off walls and position them arbitrarily in space but that’s just a TODO right now.

What doesn’t yet work is sharing of these actions. When an object is moved, that move should be visible to all occupants of the space. Likewise, when a new object is created or text updated on a sticky note, everyone should see that. That’s the next step, followed by making all of this persistent.

Anyway, here is how interaction works on the various platforms.

Windows and mac OS desktop

For Windows, the assumption is that three button (middle wheel) mouse is used. The middle button is used to grab and position objects. The right mouse button opens up the menu for that object. The left button is used for selection and resizing. On the Mac, which doesn’t have a middle button, the middle button is simulated by holding down the Command key which maps the left button into the middle button.

Navigation is via the SpaceMouse on both platforms.

Windows Mixed Reality

The motion controllers have quite a few controls and buttons available. I am using the Grab button to grab and position objects. The trigger is used for selection and resizing while the menu button is used to bring up the object menu. Pointing at the sticky note and pressing the trigger causes the virtual keyboard to appear.

Navigation uses the standard joystick-based teleport system.

Android and iOS

My solution here is a little ugly. Basically, the number of fingers used for a tap and/or hold dictates which mouse button the action maps to. A single touch means the left mouse button, two touches means the right mouse button while three touches means the middle button. It works but it is pretty amusing trying to get three simultaneous touches on an object to initiate a grab on a small screen device like a phone!

Navigation is via single or dual touch. Single touch and slide moves in x and y directions. Dual touch and slide rotates around the y axis. Since touches are used for other things, navigation touches need to be made away from objects or else they will be misinterpreted. Probably there is a better way of doing this. However, in the longer term, see-through mode using something like ARCore or ARKit will eliminate the navigation issue which is only a problem in VR (opaque) mode. I assume the physical occupants of a space will use see-through mode with only remote occupants using VR mode.

I haven’t been using ARCore or ARKit yet, mainly because they haven’t seemed good enough to create a spatial map that is useful for rt-xr. This is changing (ARKit 2 for example) but the question is whether it can cope with multiple rooms. For example, objects behind a real wall should not be visible – they need to be occluded by the spatial map. The HoloLens can do this however and is the best available option right now for multi-room MR with persistence.

 

Scaling dynamic sentient spaces to multiple locations

One of the fundamental concepts of the rt-xr and rt-ai Edge projects is that it should be possible to experience a remote sentient space in a telepresent way. The diagram above shows the idea. The main sentient space houses a ManifoldNexus instance that supplies service discovery, subscription and message passing functions to all of the other components. Not shown is the rt-ai Edge component that deals with real-time intelligent processing, both reactive and proactive, of real-world sensor data and controls. However, rt-ai Edge interconnects with ManifoldNexus, making data and control flows available in the Manifold world.

Co-located with ManifoldNexus are the various servers that implement the visualization part of the sentient space. The SpaceServer allows occupants of the space to download a space definition file that is used to construct a model of the space. For VR users, this is a virtual model of the space that can be used remotely. For AR and MR users, only augmentations and interaction elements are instantiated so that the real space can be seen normally. The SpaceServer also houses downloadable asset bundles that contain augmentations that occupants have placed around the space. This is why it is referred to as a dynamic sentient space – as an occupant either physically or virtually enters the space, the relevant space model and augmentations are downloaded. Any changes that occupants make get merged back to the space definition and model repository to ensure that all occupants are synced with the space correctly. The SharingServer provides real-time transfer of pose and audio data. The Home Automation server provides a way for the space model to be linked with networked controls that physically exist in the space.

When everything is on a single LAN, things just work. New occupants of a space auto-discover sentient spaces available on that LAN and, via a GUI on the generic viewer app, can select the appropriate space. Normally there would be just one space but the system allows for multiple spaces on a single LAN if required. The issue then is how to connect VR users at remote locations. As shown in the diagram, ManifoldNexus has to ability to use secure tunnels between regions. This does require that one of the gateway routers has a port forwarding entry configured but otherwise requires no configuration other than security. There can be several remote spaces if necessary and a tunnel can support more than one sentient space. Once the Manifold infrastructure is established, integration is total in that auto-discovery and message switching all behave for remote occupants in exactly the same way as local occupants. What is also nice is that multicast services can be replicated for remote users in the remote LAN so data never has to be sent more than once on the tunnel itself. This optimization is implemented automatically within ManifoldNexus.

Dynamic sentient spaces (where a standard viewer is customized for each space by the servers) is now basically working on the five platforms (Windows desktop, macOS, Windows Mixed Reality, Android and iOS). Persistent ad-hoc augmentations using downloadable assets is the next step in this process. Probably I am going to start with the virtual sticky note – this is where an occupant can leave a persistent message for other occupants. This requires a lot of the general functionality of persistent dynamic augmentations and is actually kind of useful for change!