Making Ordinary Phone Camera Images Suitable for Neural Network Digit Recognition

Mark Szabo
Chatbots Life
Published in
3 min readMay 6, 2019

--

Two weeks ago we were asked to deliver a workshop with one of my colleagues on machine learning and bots. We had the idea to train a neural network in the first part of the workshop and make use of this in the bot demoed in the second part — digit recognition using the MNIST dataset is a very famous ‘hello-world level’ machine learning problem, so that’s what we’ve chosen.

The only thing I wasn’t giving attention is that the MNIST dataset contains 28 by 28 pixel images with absolute white backgrounds and black digits centered into a 20 by 20 pixel square. Which is not quite like the images you’ll shoot with your phone and send to the bot…

So the accuracy of the neural network with 1080p camera images was around 10% (no better than random). The accuracy with MNIST test images are about 98%. We clearly needed some preprocessing.

The chat bot I was writing is based on Microsoft Bot Framework (which is a pretty cool open-source framework btw) so I was thinking about preprocessing the images in ASP.NET Core. Luckily there’s a great nuget package called ImageSharp.

Let’s have a look on the steps we need to take:

First, we’ll need to apply a grayscale filter on the original source image. This will basically sum the R, G and B part of each pixel and divide them with 3.

Second, we’ll remove the shadow marks (as you can see on the above image) with a vignette. This is a standard filter part of every photo app — applies a radial glow to the image making the corners darker — now with white color.

Third, separating foreground from the background. In this case the single digit from everything else. This is done by a binary threshold: everything above a given threshold will be pitch black, and everything below that absolutely white.

The next step is a little bit more complex. We need to crop the image dynamically to the content — for this, we need to find the bounding box first. Briefly, it will iterate from each side of the image and find the last row/column where there’s no pixel which is different then the background color (in this case white). You can check out the code here, if you are interested.

Fifth, we’ll have to make the image a square (add some padding to each side which is less than the maximum of the image width and height).

Lastly, to center the image we’ll first downscale the image to 20 by 20 pixel and add 4 pixel margin. This will produce a perfectly centered 28 by 28 pixel white background, black foreground image which is just what our neural network needs.

The accuracy of our solution increased from 10% to the accuracy of the neural net (98%).

You can try out the working Digit Recognizer Bot on Messenger and check out the full code over here!

I’m a Partner Technology Strategist at Microsoft, helping partners grow and reach the global market — from the technical side. A true geek, from time to time showing up at conferences and events around Central and Eastern Europe talking about some future stuff, probably with a HoloLens on my head. [)-)

--

--