MobileNet SSD object detection with Unity, ARKit and Core ML


This iOS app is really step 1 on the road to integrating Core ML enabled iOS devices with rt-ai Edge. The screenshot shows the MobileNet SSD object detector running within the ARKit-enabled Unity app on an iPad Pro. If anyone wants to try this, code is here. I put this together pretty quickly so apologies if it is a bit rough but it is early days. Detection box registration isn’t perfect as you can see (especially for the mouse) but it is not too bad. This is probably a field of view mismatch somewhere and will need to be investigated.

Next, this code needs to be integrated with the Manifold C# Unity client. Following that, I will need to write the PutManifold SPE for rt-ai Edge. When this is done, the video and object detection data stream from the iOS device will appear within an rt-ai Edge stream processing network and look exactly the same as the stream from the CYOLO SPE.

The app is based on two repos that were absolutely invaluable in putting it together:

Many thanks to the authors of those repos.

Integrating Core ML with Unity on iOS

The latest iPads and iPhones have some pretty serious edge neural network capabilities that are a natural fit with ARKit and Unity. AR and Unity go together quite nicely as AR provides an excellent way of communicating back to the user the results of intelligently processing sensor data from the user, other users and static (infrastructure) sensors in a space. The screen capture above was obtained from code largely based on this repo which integrates Core ML models with Unity. In this case, Inceptionv3 was used. While it isn’t perfect, it does ably demonstrate that this can be done. Getting the plugin to work was quite straightforward – you just have to include the mlmodel file in XCode via the Files -> Add Files menu option rather than dragging the file into the project. The development cycle is pretty annoying as the plugin won’t run in the Unity Editor and compile (on my old Mac Mini) is painfully slow but I guess a decent Mac would do a better job.

This all brings up the point that there seem to be different perceptions of what the edge actually is. rt-ai Edge can be perceived as a local aggregation and compute facility for inference-capable or conventional mobile and infrastructure devices (such as security cameras) – basically an edge compute facility supporting edge devices. A particular advantage of edge compute is that it is possible to integrate legacy devices (such as dumb cameras) into an AI-enhanced system by utilizing edge compute inference capabilities. In a sense, edge compute is a local mini-cloud, providing high capacity compute and inference a short distance in time away from sensors and actuators. This minimizes backhaul and latency, not to mention securing data in the local area rather than dispersing it in a cloud. It can also be very cost-effective when compared to the costs of running multiple cloud CPU instances 24/7.

Given the latest developments in tablets and smart phones, it is essential that rt-ai Edge be able to incorporate inference-capable devices into its stream processing networks. Inference-capable, per user devices make scaling very straightforward as capability increases in direct proportion to the number of users of an edge system. The normal rt-ai Edge deployment system can’t be used with mobile devices which requires (at the very least) framework apps to make use of AI models within the devices themselves. However, with that proviso, it is certainly possible to incorporate smart edge devices into edge networks with rt-ai Edge.