Tensorflow demystified

Published in

Chatbots Life

7 min readMar 29, 2017

To understand a new framework, Google’s Tensorflow is a framework for machine-learning calculations, it is often useful to see a ‘toy’ example and learn from it.

Here is Google’s description of the framework: TensorFlow™ is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them.

What goes on within machine learning code is math, it helps to organize this in a way that simplifies and keeps the computative flow organized.

Tensors

First of all: what is a ‘tensor’ and how does it have ‘flow’?

A ‘vector’ is a list of values, a ‘matrix’ is a table (or list of lists)… then there is a list of tables (or list of lists of lists), then a table of tables (or list of lists of tables…). And so on. All of these are ‘tensors’ and they appear everywhere in machine learning equations.

Let’s take a multi-layer neural network as an example we’re already explored.

In this example we see input data features (‘x1’, ‘x2’, …) going through 2 hidden layers, each with nodes (‘neurons’), each with weights (‘W’) and bias (‘b’), the output is y.

This becomes a series (a ‘flow’) of numerical computation (‘math’) going from input (‘X’) to output (‘y’). The math involves multi-dimensional matrices (‘tensors’), constants and variables (eg. bias ‘b1’). The definition and execution of this mathematical flow across tensors is what the Tensorflow framework is about.

As a result of repeating this ‘flow’ the weights and biases adjust to ‘fit’ the expected output and a model is built. The framework can have some of the math performed on faster ‘GPU’ processors, which is very useful when working with large quantities of data. But besides this, a framework for doing a lot of math across a series of equations is useful to organize and simplify the code. That’s what any framework strives to do.

Most tutorials on Tensorflow (or other machine learning frameworks) jump straight to recognizing hand-written digits or classifying iris flowers. Of course having substantive data is useful, but this isn’t really ‘toy’ data from the perspective of a machine-learning newbie.

For training data to achieve ‘toy’ status the data itself should be intuitive and easily grasped.

Toy data

Let’s imagine an array of 5 bits, this is our input. The output is [0,1] or [1,0] depending on: the 1st and last bit both being ON or not.

[0, 1, 1, 1, 1], [0,1]
[1, 1, 1, 1, 0], [0,1]
[1, 1, 1, 0, 0], [0,1]
[1, 1, 0, 0, 0], [0,1]
[1, 0, 0, 0, 1], [1,0]
[1, 1, 0, 0, 1], [1,0]
[1, 1, 1, 0, 1], [1,0]
[1, 0, 0, 1, 1], [1,0]

The entire dataset and its output makes intuitive sense, you (or any reasonably sentient person) could find the pattern instinctively just by looking at the data.

What if we trained a machine-learning model on several of these patterns, then asked it to predict a 5-bit pattern it hadn’t been trained on? Would it predict the correct result? That’s a good ‘toy’ problem to work through, and it is ‘machine learning’, by definition: the software learns the patterns.

Code

We’re going to define some simple data, build a model in Tensorflow and then use it to make predictions. The code is here, using Python notebook.

As always we begin with our imports, all common besides the framework.

Next we create the data:

Notice in the above setup we are shuffling the data (‘features’) and using 2/3 of it for training, 1/3 of it for testing. The ratio is the parameter ‘test_size’. Each time we run this we’ll get a different set of training and testing data. Experiment with this and introspect the resulting data lists.

Now we are ready to begin scaffolding our Tensorflow model:

Our data is loaded, we’ll use 20 nodes in 2 hidden layers and initialize weights and biases with random values. We also define our output layer.

We are now ready to define the mathematical equations for our model:

Look at the computations carefully and compare them to the earlier diagrams. We are multiplying (tf.matmul) data, matrices (‘tensors’), weights, biases, and we are using activation functions such as tf.sigmoid. The framework has built-in functions that are commonly used in machine learning.

# hidden layer 1: (data * W) + b
l1 = tf.add(tf.matmul(data,hidden_1_layer[‘weight’]), hidden_1_layer[‘bias’]) l1 = tf.sigmoid(l1)

All of this is the same as when we worked through the code for a 2-layer neural network, but now we’re using a framework to simplify the task — less code, the computations themselves are abstracted. Read carefully through the diagrams, the earlier code and this new Tensorflow code and you should see that it is all equivalent. Matrix multiplication is simple.

There’s no black-magic here: math is math.

We are now ready to train our model:

Again if you compare this with the 2-layer neural network example without using any framework, the training process is the same. We are iterating through cycles ‘epochs’ to get our error (‘cost’) rate low and then our model is ready.

Let’s look at the output and then review the training code in more detail.

Epoch   0 completed out of 1000 cost: 1.06944
Epoch 200 completed out of 1000 cost: 0.000669607
Epoch 400 completed out of 1000 cost: 0.00030982
Epoch 600 completed out of 1000 cost: 0.00019792
Epoch 800 completed out of 1000 cost: 0.00014411
Accuracy: 1.0
prediction for: [1, 1, 1, 1, 1]
0.998426 0.392255
prediction for: [1, 0, 1, 1, 1]
0.99867 0.364066
prediction for: [0, 0, 1, 1, 1]
0.028218 0.997783
prediction for: [0, 1, 0, 1, 1]
0.0528865 0.997093
prediction for: [1, 0, 0, 0, 1]
0.999507 0.413642
prediction for: [1, 0, 0, 1, 0]
0.0507428 0.998406

Notice the iterative reduction in error (‘cost’) over 1,000 cycles. The accuracy calculation is processed by iteratively applying the model to our test data (which was not used in training the model).

We can then see a prediction for each entry in our test data. An output of [1,0] means a pattern of[1, _, _, _, 1], an output of [0,1] means some other pattern. Each value in the output is a probability, so:

prediction for: [1, 0, 0, 1, 0]
0.0507428 0.998406

means our model strongly believes the [1, _, _, _, 1] pattern does not apply.

As with any ‘toy’ example all of the data is easily grasped so you can focus on the code. This 2-layer ANN is no different than one you could use to classify iris flowers or stock market trends.

More about the training code

Back to the training code, we assign a variable named ‘prediction’ to our model, very simply:

# use the model definition
prediction = neural_network_model(x)

Then we tell the framework how to optimize for error:

c = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(prediction,y))
optimizer = tf.train.GradientDescentOptimizer(1).minimize(c)

We can use other optimization methods built into the framework.

Then we loop through our epochs to fit our model:

with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for epoch in range(hm_epochs):
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

With each batch of training data, the model improves (lowers the error for) its weights and biases.

The accuracy of the model is a calculation across our test data:

correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct, ‘float’))
print(‘Accuracy:’,accuracy.eval({x:test_x, y:test_y}))

And making a prediction on some new data is relatively simple, we use the activation function again to normalize output values between 0 and 1.

print (‘prediction for:’, test_x)
output = prediction.eval(feed_dict = {x: [test_x]})
print(tf.sigmoid(output[0][0]).eval(), tf.sigmoid(output[0][1]).eval())

What happened?

We ‘taught’ a machine learning algorithm about a 5-bit pattern, and it learned by looking at training data:

[1, _, _, _, 1] and not. It ‘learned’ this by looking at example, not by being told what the rules of the pattern are. That would have been trivial, and idiotic, as follows:

Of course this wouldn’t be ‘machine learning’, it’s just a reduction of the problem using code. If we have to reduce something like image recognition in code we are in trouble. That’s where these predictive models come in — forged by numerics created as data flows through equations.

You should now be better equipped to work through some ‘real’ problems, such as recognizing hand-written digits.