Using edge inference to detect real world objects with Unity AR Foundation, ARKit and rt-ai Edge

The Unity AR Foundation provides a convenient high level way of utilizing ARCore and ARKit in order to implement mixed and augmented reality applications. I used it to implement an iPad app that could access an rt-ai Edge Composable Processing Pipeline (CPP) via the new Conductor Stream Processing Element (SPE). This is the CPP used to test Conductor:


The Conductor SPE provides a Websocket API to mobile devices and is able to pass data from the mobile device to the pipeline and then return the results of the CPP’s processing back to the mobile device. In this case, I am using the CYOLO SPE to perform object detection on the video stream from the mobile device’s camera. The output of the CYOLO SPE goes to three destinations – back to the Conductor, to a MediaView for display locally (for debug) and also to a PutManifold SPE for long term storage and off-line processing.

The iPad Unity app used to test this arrangement uses AR Foundation and ARKit for spatial management and convenient access to camera data. The AR Foundation is especially nice as, if you only need the subset of ARKit functionality currently available, you can do everything in the C# domain without having to get involved with Swift and/or Objective C and all that. The captured camera data is formatted as an rt-ai Edge message and sent via the Websocket API to the Conductor. The Conductor returns detection metadata to the iPad which then uses this to display the labelled detection frames in the Unity space.

Right now, the app draws a labelled frame at a constant distance of 1 meter from the camera to align with the detected object. However, an enhancement would be to use depth information (if there is any) so that the frame could be positioned at the correct depth. Or if that wasn’t useful, the frame label could include depth information.

This setup demonstrates that it is feasible for an XR app to offload inference to an edge compute system and process results in real time. This greatly reduces the load on the mobile device, pointing the way to lightweight, low power, head mounted XR devices that could last for a full workday without recharge. Performing inference on-device (with CoreML for example) is certainly a viable alternative, especially where privacy dictates that raw data (such as video) cannot leave the device. However, processing such data using an edge compute system is hardly the same as sending data out to a remote cloud so, in many cases, privacy requirements can still be satisfied using edge offload.

This particular setup does not require Orchestrator as the iPad test app can go directly to the Conductor, which is part of a statically allocated CPP. The next step to complete the architecture is to add in the Orchestrator interaction so that CPPs can be dynamically instantiated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.