These days, machine learning techniques have led to the ability to create very realistic but fake video and audio that can be tough to distinguish from the real thing. The video above shows a very interesting example of this capability. The problem with this technology is that it will become impossible to determine if anything is genuine at all. What’s needed is some verification that a video of someone (for example) really is that person. Blockchain technology would seem to provide a solution for this.
Many years ago I was working on a digital watermarking-based system for detecting tampering in video records. Essentially, this embedded error-correcting codes in each frame that could be used to determine if any region of a frame had been modified after the digital watermark had been added. Cameras would add the digital watermark at source, limiting the opportunity for modification prior to watermarking.
One problem with this is that it worked on a frame by frame basis but didn’t ensure the integrity of an entire sequence. In theory this could be done with temporally distributed watermarks but blockchain technology provides a very nice alternative.
A simple strategy would be to have the sensor (camera, microphone, motion detector, whatever) create a hash for each unit of data (video frame, chunk of audio etc) and add this to a blockchain. Then a review app could create new hashes from the sensor data itself (stored elsewhere) and compare them to those in the blockchain. It could also determine that the account owner or device is who or what it is supposed to be in order to avoid spoofing. It’s easy to envisage an Etherium smart contract being the basis of such a system.
One issue with this is the potential rate at which hashes need to be added to the blockchain. This rate could be reduce by collecting more data (e.g. accumulating one second’s worth of data to generate one hash) or creating a hash of hashes at an appropriate rate. The only downside to this is losing temporal resolution of where changes have been made.
It’s worth considering the effects of lossy compression. Obviously if a stream is uncompressed or only uses lossless compression, watermarking and hash generation can be done at a very early stage. Watermarking of video is designed to withstand compression so that can still be done at a very early stage, even with lossy compression. The hash has to be be bit-accurate with the stream as stored on the video storage medium though so the hash must be computed after lossy compression.
It seems as though this blockchain concept could definitely be made to work and possibly combined with the digital watermarking technique in the case of video to provide temporal and spatial resolution of tampering. I am sure that variations of this concept are out there already or being developed and maybe, one day, it will be possible for anybody to check if a video of a well-known person is real or fake.
The MQTT-based heart of rt-ai Edge is ideal for constructing stream processing networks (SPNs) that are intended to run continuously. rt-ai Edge tools (such as rtaiDesigner) make it easy to modify and re-deploy SPNs across multiple nodes during the design phase but, once in full time operation, these SPNs just run by themselves. An existing stream processing element (SPE), PutNiFi, allows data from an rt-ai Edge network to be stored and processed by big data tools – using Elasticsearch for example. However, these types of big data tools aren’t always appropriate, especially if low latency access is required as Java garbage collection can cause random delays.
For many applications, much simpler but reliably low latency storage is desirable. The Manifold system already has a storage app, ManifoldStore, that is optimized for timestamp-based searches of historical data. A new SPE called PutManifold allows data from an SPN to flow into a Manifold networking surface. The SPN screen capture above shows two instances of the PutManifold SPE used to transfer audio and video data from the SPN. ManifoldStore grabs passing data and stores it using timestamp as the key. Manifold applications can then access historical data flows using streamId/timestamp pairs. It is particularly simple to coordinate access across multiple data streams. This is very useful when trying to correlate events across multiple data sources at a particular point or window in time.
ManifoldStore is intrinsically schemaless in that it can store anything that consists of a JSON part and a binary data part, as used in rt-ai Edge. A new application called rtaiView is a universal viewer that allows multiple streams of all types to be displayed in a traditional split-screen monitoring format. It uses ManifoldStore for its underlying storage and provides a window into the operation of the SPN.
Manifold is designed to be very flexible with various features that reduce configuration for ad-hoc uses. This makes it very easy to perform offline processing of stored data as and when required which is ideal for offline machine learning applications.
rt-ai Edge is a new concept in edge processing that makes it easy for anyone to build AI and ML enhanced stream processing pipelines in order to close the local loop and offload communications networks and the cloud. Semantic extraction of meaningful data from raw data feeds at the edge ensures that the core only has to deal with actionable information, not noise. rt-ai Edge leverages hardware acceleration within embedded devices to filter raw data into highly salient messages for higher level processing.
rt-ai Edge is in active development right now.
Very interesting work here that uses recurrent neural network ideas to predict next frames in a video sequence. It’s amazing how many times LSTM pops up these days. Unsupervised learning is one of the most interesting areas of machine learning at the moment and the potential is seemingly unlimited. This is another example of using LSTM for understanding video representations using LSTM. It’s a fascinating area.
Some every interesting software from Facebook’s AI Research that implements segmentation and labelling of images. Code is available on GitHub that uses Torch as its AI engine. Could be a good addition to rtndf as part of a video pipeline. Even if the segmentation and labelling is slower than real time, it’s possible to use a bypass system to keep the frame rate up while also processing selected key frames. This is done by the OpenFace PPE already. As things may move between key frames in a video pipeline, a strategy might be to buffer frames after the first key frame until the results from the second key frame are available and interpolate the segmentation results for the intermediate frames. Then, the buffered frames can be played out at the correct rate. Obviously this adds latency but might be acceptable in some situations.
Yes, that is me waving my Taylor (made in San Diego 🙂 ) guitar around in a very careless manner. It’s all in a good cause though. Turns out that Inception-v3 is very good at recognizing acoustic and electric guitars. I put together a new rtndf PPE called recognize based on the code here from the TensorFlow repo.
In its simplest mode, the recognize PPE takes an incoming video stream and tries to recognize an object in the entire frame. If it finds something, it adds a label in the bottom left corner of the image and uses that to generate a new output stream. That’s ok, but what’s more interesting is when it works with another PPE, modet. modet detects moving objects in the stream and draws a box around them. It now also adds metadata to the outgoing pipeline messages that can be used by downstream PPEs to do something with the regions where motion has been detected.
recognize can work in a mode where it uses the modet metadata to recognize moving objects in the stream. The screen capture with the guitar is an example. That’s why I was waving it around – it had to be in motion to get detected and recognized. The box is that big because I am in motion too! However, Inception-v3 seems quite able to recognize the dominant object in the image segment. While there is only one recognized object in this example, if there were more regions they would be individually recognized.
Of course, the example data set for Inception-v3 only knows so many things, guitars being an example. However, something I want to use this for is to detect a UPS truck coming up the drive. I’ll probably have to try retraining the final layer to do this.
I am currently reading Python Machine Learning as I wanted to know more about scikit-learn, amongst other things. It’s a very practical guide with just enough theory to make sense of it all. A lot of machine learning books dive pretty deep into the theory, which is great if that’s what you want. On the other hand, if the idea is to get doing something fast, this book seems like a great place to start. It’s always easier to delve into theory when its relevance is clear and there’s nothing like actually writing and running code to get a feel for relevance.