Semantic image segmentation with TensorFlow using DeepLab

I have been trying out a TensorFlow application called DeepLab that uses deep convolutional neural nets (DCNNs) along with some other techniques to segment images into meaningful objects and than label what they are. Using a script included in the DeepLab GitHub repo, the Pascal VOC 2012 dataset is used to train and evaluate the model. One of the results is shown above. It has managed to extract some pretty ugly furniture from a noisy background quite nicely. Here are couple more examples:

The software has done a nice job of extracting the foreground objects in another very noisy scene.

The person in the background is picked up pretty nicely here – I didn’t even notice the person at first.

Incidentally, to get the to work on Ubuntu 16.04 I had to change the call to to use bash instead of sh otherwise it generated an error. Also, I needed to install cuDNN 7.0.4 for Cuda 9.0 rather than cuDNN 7.1.1 in order to get the Jupyter notebook example operating.

What I would like to do now is to create an rt-ai Edge Stream Processing Element (SPE) based on this code to act as a preprocessor stage in order to isolate and identify salient objects in a video stream in real time. One of my interests is understanding behaviors from video and this could be a valuable component in that pipeline by allowing later stages to focus on what’s important in each frame.

6 thoughts on “Semantic image segmentation with TensorFlow using DeepLab”

  1. Hi Richards, I am trying to do the same but I need to build my own dataset for specific application (segmentation of buildings’ elements : doors, windows…). Could you recommand me a tutorial to generate/prepare a dataset compliant with DeepLab? I encounter some dificulties in finding suitable “good results” in my Google queries…

    1. This link gives an example of training a model ( but assumes the training dataset exists. This page ( gives an idea of what the Cityscapes training data looks like. The paper here ( has some more information. Producing a training set looks distinctly non-trivial.

      1. Thank you for these links. But I already saw these “tutorials” which are not very explicit and detailed. Anyway, it is good to share another point of view: thank! Two formats of data seem to be compatible with Deeplab (cityscapes,voc2012) but none of them are clearly described or commented. I continue to search. 🙂

      2. Probably you would need to emulate the training set format of one of the original training sets. It might be worth looking at the SDK that went with the VOC2012 data – it might have some information that’s useful.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.