These days, machine learning techniques have led to the ability to create very realistic but fake video and audio that can be tough to distinguish from the real thing. The video above shows a very interesting example of this capability. The problem with this technology is that it will become impossible to determine if anything is genuine at all. What’s needed is some verification that a video of someone (for example) really is that person. Blockchain technology would seem to provide a solution for this.
Many years ago I was working on a digital watermarking-based system for detecting tampering in video records. Essentially, this embedded error-correcting codes in each frame that could be used to determine if any region of a frame had been modified after the digital watermark had been added. Cameras would add the digital watermark at source, limiting the opportunity for modification prior to watermarking.
One problem with this is that it worked on a frame by frame basis but didn’t ensure the integrity of an entire sequence. In theory this could be done with temporally distributed watermarks but blockchain technology provides a very nice alternative.
A simple strategy would be to have the sensor (camera, microphone, motion detector, whatever) create a hash for each unit of data (video frame, chunk of audio etc) and add this to a blockchain. Then a review app could create new hashes from the sensor data itself (stored elsewhere) and compare them to those in the blockchain. It could also determine that the account owner or device is who or what it is supposed to be in order to avoid spoofing. It’s easy to envisage an Etherium smart contract being the basis of such a system.
One issue with this is the potential rate at which hashes need to be added to the blockchain. This rate could be reduce by collecting more data (e.g. accumulating one second’s worth of data to generate one hash) or creating a hash of hashes at an appropriate rate. The only downside to this is losing temporal resolution of where changes have been made.
It’s worth considering the effects of lossy compression. Obviously if a stream is uncompressed or only uses lossless compression, watermarking and hash generation can be done at a very early stage. Watermarking of video is designed to withstand compression so that can still be done at a very early stage, even with lossy compression. The hash has to be be bit-accurate with the stream as stored on the video storage medium though so the hash must be computed after lossy compression.
It seems as though this blockchain concept could definitely be made to work and possibly combined with the digital watermarking technique in the case of video to provide temporal and spatial resolution of tampering. I am sure that variations of this concept are out there already or being developed and maybe, one day, it will be possible for anybody to check if a video of a well-known person is real or fake.
Fascinating video about a system that teaches a drone to fly around urban environments using data from cars and bikes as training data. There’s a paper here and code here. It’s a great example of leveraging CNNs in embedded environments. I believe that moving AI and ML to the edge and ultimately into devices such as IoT sensors is going to be very important. Having dumb sensor networks and edge devices just means that an enormous amount of worthless data has to be transferred into the cloud for processing. Instead, if the edge devices can perform extensive semantic mining of the raw data, only highly salient information needs to be communicated back to the core, massively reducing bandwidth requirements and also allowing low latency decision making at the edge.
Take as a trivial example a system of cameras that read vehicle license plates. One solution would be to send the raw video back to somewhere for license number extraction. Alternately, if the cameras themselves could extract the data, then only the recognized numbers and letters need to be transferred, along with possibly an image of the plate. That’s a massive bandwidth saving over sending constant compressed video. Even more interesting would be edge systems that can perform unsupervised learning to optimize performance, all moving towards eliminating noise and recognizing what’s important without extensive human oversight.
There’s a new TensorFlow model for image captioning available here. It combines a deep convolutional neural network (Inception-v3) with an LSTM-based decoder network. LSTM is cropping up just about everywhere now…
Very interesting work here that uses recurrent neural network ideas to predict next frames in a video sequence. It’s amazing how many times LSTM pops up these days. Unsupervised learning is one of the most interesting areas of machine learning at the moment and the potential is seemingly unlimited. This is another example of using LSTM for understanding video representations using LSTM. It’s a fascinating area.
I am currently working with TensorFlow and I thought it’d be interesting to see what kind of performance I could get when processing video and trying to recognize objects with Inception-v3. While I’d like to get TensorFlow integrated with some of my Qt apps, the whole “build with Bazel” thing is holding that up right now (problems with Eigen includes – one day I’ll get back to that). As a way of taking the path of least resistance, I included TensorFlow in an inline MQTT filter written in Python. It subscribes to a video topic sourced from a webcam and outputs recognized objects in the stream.
As can be seen from the screen capture, it’s currently achieving 11 frames per second using 640 x 480 frames with a GTX 970 GPU. With a GTX 960 GPU, the rate falls to around 8 frames per second. This is pretty much what I have seen with other TensorFlow graphs – the GTX 970 is about 50% faster than a GTX 960, probably due to the restricted memory bus width on the GTX 960.
Hopefully I’ll soon have a 10 series GPU – that should be an interesting comparison.
Came across this great blog about machine learning with the most recent entry describing how to build a neural stack machine in Python based on a paper published by DeepMind. There are some earlier blog entries that build up to this to help with the background. Looks like a tremendous amount of effort was put into this work and it’s well worth a read – and trying out the Python code.
Interesting story here about what parallel resources the brain musters to perform simple tasks. It suggests that trying to build a functional brain-analog by simulating individual neurons is unnecessary. Instead, a much more practical silicon implementation would come from understanding the aggregate behavior of groups of neurons and simulating that instead. Not a new idea but it’s interesting to see an attempt to start to understand how this might work.
Continue reading “The brain and parallel processing”