Accessing the iOS WiFi IP address within a Unity app

At the moment, Manifold requires that a client supplies an appropriate IP address (although I might change this in the future). Mostly it is pretty easy to do but the .NET way of doing things (using Dns.GetHostEntry()) didn’t seem to pick up the WiFi IP address on an iPad. After wasting a lot of time, I decided to go back to basics and create a native plugin.

The basis of the code comes from here – it just needed the right wrapping to get it to work with Unity. I will be the first to admit that I know nothing about native iOS coding but, following the Bonjour example here, the code below  seemed to work just fine when placed in the Assets/Plugins/iOS directory of the project.

IPAddress.h:

#import <Foundation/Foundation.h>

@interface IPAddressDelegate : NSObject

- (NSString *)getAddress;
@end

IPAddress.m:

#include <ifaddrs.h>
#include <arpa/inet.h>

#import "IPAddress.h"
@implementation IPAddressDelegate

- (id)init
{
    self = [super init];
    return self;
}

- (NSString *)getAddress {
    NSString *address = @"error";
    struct ifaddrs *interfaces = NULL;
    struct ifaddrs *temp_addr = NULL;
    int success = 0;
    success = getifaddrs(&interfaces);
    if (success == 0) {
        temp_addr = interfaces;
        while(temp_addr != NULL) {
            if(temp_addr->ifa_addr->sa_family == AF_INET) {
                if([[NSString stringWithUTF8String:temp_addr->ifa_name] isEqualToString:@"en0"]) {
                    address = [NSString stringWithUTF8String:inet_ntoa(((struct sockaddr_in *)temp_addr->ifa_addr)->sin_addr)];
                }
            }
            temp_addr = temp_addr->ifa_next;
        }
    }
    // Free memory
    freeifaddrs(interfaces);
    return address;
}
@end

static IPAddressDelegate* delegateObject = nil;

char* MakeStringCopy (const char* string)
{
    if (string == NULL)
        return NULL;
    
    char* res = (char*)malloc(strlen(string) + 1);
    strcpy(res, string);
    return res;
}

const char * getLocalWifiIpAddress()
{
    if (delegateObject == nil)
        delegateObject = [[IPAddressDelegate alloc] init];
    
    return MakeStringCopy([[delegateObject getAddress] UTF8String]);
}

To use the plugin is pretty straightforward. Just add this declaration to a C# class:

	[DllImport ("__Internal")]
	private static extern string getLocalWifiIpAddress();

Then call getLocalWiFiIpAddress() to get the dotted address string.

rt-xr sentient space visualization now on iOS!

I have to admit, I am in a state of shock right now. For some reason today I decided to try to get the rt-xr Viewer software working on iOS. After all, it worked fine on Windows desktop, UWP (Windows MR), macOS and Android so why not? However, I expected endless trouble with the Manifold library but, as it turned out, getting it to work on iOS was trivial. I guess Unity and .NET magic came together so I didn’t have to do too much work once again. In fact, the hardest part was working out how to sort out microphone permission and that wasn’t too hard – this thread certainly helped with that. Avatar pose sharing, audio sharing, proxy objects, video and sensor feeds all work perfectly.

The nice thing now is that most (if not all) of the further development is intrinsically multi-platform.

Streaming PCM audio from Unity on Android

The final step in adding audio support to rt-xr visualization was to make it work with Android. Supporting audio capture natively on Windows desktop and Windows UWP was relatively easy since it could all be done in C#. However, I didn’t really want to implement a native capture plugin for Android and in turns out that the Unity capture technique works pretty well, albeit with noticeable latency.

The Inspector view in the screen capture shows the idea. The MicrophoneFilter script starts up the Unity Microphone and adds it to the AudioSource. When running, the output of the AudioSource is passed to MicrophoneFilter via the OnAudioFilterRead method that gives access to the PCM stream from the microphone.

The resulting stream needs some processing, however. I am sending single channel PCM audio at 16000 samples per second on the network whereas the output of the AudioSource is stereo, either 16000 or 48000 depending on the platform and floating point rather than 16 bit values so the code has to be able to convert this. It also needs to zero out the output of the filter otherwise it will be picked up by the listener on the main camera which is certainly not desirable! There is an alternate way of running this that uses the AudioSource.clip.GetData call directly but I had problems with that and also prefer the asynchronous callback used for OnAudioFilterRead rather than using Update or FixedUpdate to poll. The complete MicrophoneFilter script looks like this:

using UnityEngine;

[RequireComponent(typeof(AudioSource))]
public class MicrophoneFilter : MonoBehaviour
{
    [Tooltip("Index of microphone to use")]
    public int deviceIndex = 0;

    private StatusUpdate statusUpdate;
    private bool running = false;
    private byte[] buffer = new byte[32000];
    private int scale;

    // Use this for initialization
    void Start()
    {

        AudioSource source = GetComponent<AudioSource>();

        if (deviceIndex >= Microphone.devices.Length)
            deviceIndex = 0;

        GameObject scripts = GameObject.Find("Scripts");
        statusUpdate = scripts.GetComponent<StatusUpdate>();

        int sampleRate = AudioSettings.outputSampleRate;

        if (sampleRate > 16000)
            scale = 3;
        else
            scale = 1;

        source.clip = Microphone.Start(Microphone.devices[deviceIndex], true, 1, sampleRate);
        source.Play();
        running = true;
    }

    private void OnAudioFilterRead(float[] data, int channels)
    {
        if (!running)
            return;

        int byteIndex = 0;
        if (channels == 1) {
            for (int i = 0; i < data.Length;) {
                short val = (short)((data[i]) * 32767.0f);
                for (int offset = 0; offset < scale; offset++) {
                    if (i < data.Length) 
                        data[i++] = 0; 
                } 
                buffer[byteIndex++] = (byte)(val & 0xff); 
                buffer[byteIndex++] = (byte)((val >> 8) & 0xff);
            }
        } else {
            for (int i = 0; i < data.Length;) {
                short val = (short)((data[i] + data[i + 1]) * 32767.0f / 2.0f);
                for (int offset = 0; offset < 2 * scale; offset++) {
                    if (i < data.Length) 
                        data[i++] = 0; 
                } 
                buffer[byteIndex++] = (byte)(val & 0xff); 
                buffer[byteIndex++] = (byte)((val >> 8) & 0xff);
            }
        }
        statusUpdate.newAudioData(buffer, byteIndex);
    }
}

Note the fixed maximal size buffer allocation to try to prevent too much garbage collection. In general, the code uses maximal sized fixed buffers wherever possible.

The SharingServer has now been updated to generate separate feeds for VR and AR/MR users with all user audio feeds in the VR version and just VR headset users’ audio in the MR version. The audio update rate has also been decoupled from the avatar pose update rate. This allows a faster update rate for pose updates than makes sense for audio.

Just a note on why I am using single channel 16 bit PCM at 16000 samples per second rather than sending single channel floats at 48000 samples per second which would be a better fit in many cases. The problem is that this makes the data rate 6 times higher – it goes from 256kbps to 1.536Mbps. Using uncompressed 16 bit audio and dealing with the consequences seemed like a better trade than either the higher data rate or moving to compressed audio. This decision may have to be revisited when running on real MR headset hardware however.

Proxy objects: Unity assets that are UI extensions of remote servers

For some reason I often end up back at the analog clock for trying out new ideas. I guess it is because it is pretty trivial to operate a clock – just supply three angles. In this case, the clock is a proxy object which is in many ways just a simple extension of the system that animates the avatars for other occupants of a sentient space. A proxy object is a conventional Unity GameObject hierarchy that has certain specially named child nodes. By itself, there’s nothing special about the Unity asset part of a proxy object – it could be an asset included in the app or an asset downloaded from a server using Unity’s asset bundle system. Either way, these specially named nodes can be linked to external servers. In this case, the SharingServer generates an analog clock stream that animates the clock hands. The clock definition is contained in the space definition file that instantiates all the other parts of the scene.

In principle, interaction (i.e. sending stuff back to the remote server) can be added by using specially named nodes to attach scripts that are hard-coded in the app. I haven’t tried this yet but see no reason why it wouldn’t work. The key point is that proxy objects leverage standard scripts in the app as opposed to customized scripts for every asset.

Right now, you can modify the local scale, local position, local orientation, color and text (if associated with a TextMesh) of any of the GameObjects in an asset’s hierarchy. This could easily be extended to other things including updating a texture with a new image. For example, a virtual fireplace could be created where the flames are animated by constantly varying the textures being displayed. The system is still simplistic however as there are no mechanisms for controlling transitions (such as lerping between positions or fading between textures) but this could certainly be added without too much difficulty.

Just for reference, the analog clock stream message looks like this:

{
    "type": "proxyobject",
    "updateList": [
        {
            "name": "PO_AnalogClock_Second",
            "orientation": {
                "x": 0,
                "y": 222,
                "z": 0
            },
            "orientationValid": true
        },
        {
            "name": "PO_AnalogClock_Minute",
            "orientation": {
                "x": 0,
                "y": 342,
                "z": 0
            },
            "orientationValid": true
        },
        {
            "name": "PO_AnalogClock_Hour",
            "orientation": {
                "x": 0,
                "y": 568,
                "z": 0
            },
            "orientationValid": true
        }
    ]
}

Here the y value encodes the relevant hand angle. The hour angle is greater than 360 degrees as the system uses a 24 hour clock but the result is the same whatever.

Using blockchain technology to create verifiable sensor records and detect fakes

These days, machine learning techniques have led to the ability to create very realistic but fake video and audio that can be tough to distinguish from the real thing. The video above shows a very interesting example of this capability. The problem with this technology is that it will become impossible to determine if anything is genuine at all. What’s needed is some verification that a video of someone (for example) really is that person. Blockchain technology would seem to provide a solution for this.

Many years ago I was working on a digital watermarking-based system for detecting tampering in video records. Essentially, this embedded error-correcting codes in each frame that could be used to determine if any region of a frame had been modified after the digital watermark had been added. Cameras would add the digital watermark at source, limiting the opportunity for modification prior to watermarking.

One problem with this is that it worked on a frame by frame basis but didn’t ensure the integrity of an entire sequence. In theory this could be done with temporally distributed watermarks but blockchain technology provides a very nice alternative.

A simple strategy would be to have the sensor (camera, microphone, motion detector, whatever) create a hash for each unit of data (video frame, chunk of audio etc) and add this to a blockchain. Then a review app could create new hashes from the sensor data itself (stored elsewhere) and compare them to those in the blockchain. It could also determine that the account owner or device is who or what it is supposed to be in order to avoid spoofing. It’s easy to envisage an Etherium smart contract being the basis of such a system.

One issue with this is the potential rate at which hashes need to be added to the blockchain. This rate could be reduce by collecting more data (e.g. accumulating one second’s worth of data to generate one hash) or creating a hash of hashes at an appropriate rate. The only downside to this is losing temporal resolution of where changes have been made.

It’s worth considering the effects of lossy compression. Obviously if a stream is uncompressed or only uses lossless compression, watermarking and hash generation can be done at a very early stage. Watermarking of video is designed to withstand compression so that can still be done at a very early stage, even with lossy compression. The hash has to be be bit-accurate with the stream as stored on the video storage medium though so the hash must be computed after lossy compression.

It seems as though this blockchain concept could definitely be made to work and possibly combined with the digital watermarking technique in the case of video to provide temporal and spatial resolution of tampering. I am sure that variations of this concept are out there already or being developed and maybe, one day, it will be possible for anybody to check if a video of a well-known person is real or fake.

PuncturedPlane4 – procedural meshes with elliptical cutouts

Not content with rectangular cutouts, I thought it would be a fun exercise to generate elliptical cutouts. The result is shown above. Works very nicely except that the rectangle bounding the ellipse always gets filled in which causes problems if overlapping other things. Once again, not perfect but not bad either. This is the textured version:

Code is here.

PuncturedPlane3: optimized punctured procedural mesh generation

Once I had stopped congratulating myself for reducing the generated triangle count in the previous post, I realized that I still needed to trim the vertex, normal and uv arrays down to include only the vertices used in the triangle array. This is not completely trivial as the indices of the vertices will change which means that the triangle array indices have to be mapped. Nothing too tragic though.  Anyway, the result is shown in the screen capture above, using a wireframe shader to illustrate the triangles generated. The vertex count was reduced from 150801 to 34 in this particular case.

Code is here. It’s clearly not completely optimal as it missed the fact that it could have used two big triangles at the top, just like the bottom. It could be made much smarter and in fact get rid of the step quantization entirely. However, this algorithm is easy to understand and good enough for my purposes.