Machine Learning for Dummies: Part 1

Code Snippets & Github Included

Published in

Chatbots Life

12 min readJan 16, 2017

I often get asked on how to get started with Machine Learning. Most of the time, people have troubles understanding the maths behind all things. And I have to admit, I don’t like the maths either. Math is an abstract way of describing things. And I think the way machine learning is described is too abstract to understand it easily.

So in this article (series?) I probably try to describe things with foo code or a bit of JS to explain what I’m talking about.

There will also be an experimental github repository where I put everything together, so that you can follow the steps and implement things on your own. I say this is experimental because hurdles might appear from time to time :)

The first thing you have to know is that there are different concepts that allow different solutions.

There are some different solutions, I’ll dig into the more complex solutions later when you’re ready for them. The problems are mostly divided in time-sensitive data (RNNs and RNN LSTMs), visual or pixel-related data (CNNs) and simple vectorized data (BBNs, BNs, BNNs and QNNs).

However, there are more complex architectures around from the engineering and biological perspective; and I think they’re a quite powerful toolset to know — as they can be freely combined with “low-level” machine learning solutions.

0. Basics of Neural Networks

Neural networks aren’t that hard to understand. They are only not-so-well explained to newbies. As everyone is using maths, I’m going to try a different method here.

If we talk about a simple neural network, it’s always structured in three different “categories”: Input layers, hidden layers and output layers.

The input layer typically represent sensors, where each input value of the array represents a value from 0.0 to 1.0. People like to name that vectors, I name them arrays with values.

The output layer represents typically a yes/no type of answer to a question — or in other use cases vectorized properties. That could be kind of anything, from enums to position objects (x,y,z vectors).

The important part is that neural networks can represent complex transformation functions and decisions up to a limit where bayesian solutions chime in.

Neural networks are dumb. They can only understand values from 0.0 to 1.0; which means we have to write adapters for it so that it can understand what the data means.

Those adapters do nothing more than to transform values into an area between 0.0 and 1.0. In more generic implementations, they tend to be called sensors or controls, given the idea of reusing them for other neural network structures.

For example, when we want to analyze a pixel position of a paddle in a pong game, we would use something like this:

let input = [ 0, 0, 0 ]; // x,y,zinput[0] = entity.position.x / screen.width;
input[1] = entity.position.y / screen.height;
input[2] = 0; // we don't have a z position, have we?let answer = neural_network.compute(input);
if (answer[0] > 0.5) {
    entity.moveUpwards();
} else {
    entity.moveDownwards();
}

Now we know that a neural network can compute inputs and give us some outputs that can either be something like an answer to a question (if > 0.5 then yes else no) or a data object that we can translate to our simulation world (output = [0.5, 1.0, 0.99 ] would be position coordinate).

Neural networks themselves are structured in the previously mentioned layers where each layer contains multiple neurons. A simple feed-forward network is built in a way that each and every neuron is connected to each and every neuron in the previous layer; iterating from the left input layer to right output layer.

INTRODUCTION TO DEEP LEARNING ON SOCIAL NETWORKS: “HOW TO LEARN EVEN MORE ABOUT YOUR PERSONAL LIFE…

POSTED BY CRISTIAN RANDIERI IN TECHNOLOGY NEWS

chatbotslife.com

Those connection and “how active” they are is represented using so-called weights. A weight typically represents one neuron connection. The important part here is that input neurons have no connection to a previous layer, so the compute() method in our implementation must respect that input neurons just directly get their neuron.value set with the input array’s values.

When talking about how neurons communicate, how the weights change over time and when to activate the next neuron in our network — that’s a so-called activation function. Activation functions are a bit of a fuzz currently, as everyone has their own opinion on that.

An example sigmoid activation function can be something like this:

const _sigmoid = function(value) {
    return (1 / (1 + Math.exp((-1 * value) / 1)));
};

All in all, you only have to know that activation functions are the idea to simulate how neurons work. The activation function in reality does nothing more than take one value (formally speaking a sum of values) and transforms it to another one; where the behaviour is similar to the ease-out or ease-in-out tweening in the user interface and animation world.

Overfitting and the discussion is a more complex topic, but I’m bluntly gonna ignore it totally here to save time and confusion.

What you need to know is that the CNN (convolutional) guys finally are using the first two-agent system where one neural network creates fake data and the other one tries to figure out whether it was faked or not.

As both increase in strength, they get supergood at classifying real world things. You can see that similar to a real-world example where money bills produced by banks always get more detailed because the money forgers get better. Well, at least in modern countries ;)

However, as an evolutionary ANN guy, this is boring and really nothing new. The idea of competing AIs in a multi-agent system is old, and it’s important to know that it is a seriously powerful idea to advance behaviours of neural networks.

Finding the genre of a song with Deep Learning — A.I. Odyssey part. 1

A step-by-step guide to make your computer a music expert.

chatbotslife.com

1. Genetic Programming and Evolution

Genetic Programming is the idea to use the representation of genomes for data. The huge advantage of genetic programming is, when combined with an evolutionary algorithm, that it can achieve pretty good results very fast.

A basic evolutionary algorithm always has three different cycles: Train, evaluate, breed, repeat.

The three evolution cycles: Training, Evaluation, Breeding

That is due to the fact that an evolution always consists of a population pool. That population is initially filled with neural networks with random values so that you can achieve those quick results. Those neural networks are typically called agents as they are competing against each other.

Of course, calling random to achieve the “perfect value” is a bit dumb and takes mathematically literally forever; so there’s something in place called fitness measurement. This fitness can be a simple progress value for each agent and genome.

For example, in a super mario game the distance to the left or something like “amount of points” or “amount of killed enemies” can be used as a fitness value.

The fitter agents get, the more likely they are to be bred with other fit agents. Fitter agents produce always two babies, using the idea of a mum and dad with dominant genes. That means one baby looks more like mum (daughter) and the other one looks more like dad (son) to ensure that the knowledge of both have the chance to get better in future cycles.

Genome crossover / zw agent and zz agent

The crossover algorithm typically randomizes the part “where” to split the genome. That means it is randomized one time and the daughter gets one part (70% mum / 30% dad) while the son gets the other part (30% mum / 70% dad).

let dna_split = (Math.random() * mum_genome.length) | 0;
let daughter  = new Genome();
let son       = new Genome();for (let d = 0; d < mum_genome.length; d++) {    if (d > dna_split) {
        son[d]      = mum_genome[d];
        daughter[d] = dad_genome[d];
    } else {
        daughter[d] = mum_genome[d];
        son[d]      = dad_genome[d];
    }}

The values each genome represents typically stands for the values of neural network weights. We learn what neural network weights are later. For now we only have to know that the “contents” of a neural network are represented in that “giant genome array”.

Each weight of the neural network has an equivalent “cell” in the genome array.

A genome typically represents the weights of all neurons in the neural network

However; the problems a typical evolution has is that when compared on the timeline, the results aren’t improving much when the passed time gets longer. That is mostly because the mutation rate is constant and/or too high to get better innovations.

At first, a high mutation rate is good and what you want. It will give you “fairly good” results in a short amount of time. Later on, a high mutation rate isn’t good and will worsen the amount of innovation it can produce.

If you want to try out a very simple Evolutionary AI demo I put together, you can do so by opening the Flappy Plane demo right now in your (modern) web browser.

As you might see, there will always eventually be one point in time where evolution doesn’t cut it anymore and the neural networks won’t improve into a more perfect state. Without backpropagation, it’s not possible to get better by pure randomization. Well, actually it is, but it will require exactly an infinite amount of time, so we humans probably gonna die before it happens.

Those mutations on a fitness timeline chart can be identified pretty quickly, because you will see them as “steps” where suddenly all dominant agents get a bit better:

Evolution vs. Backpropagation Fitness over Time

More advanced concepts like NEAT and HyperNEAT try to tackle the innovations-by-mutation problem by analyzing the behaviour and the mutations over time.

For example, each genome’s performance gets evaluated on its own and only the “better” genomes survive, but the worse ones are remembered to improve randomization.

The basic idea behind all advanced evolutionary concepts is still nothing more than to optimize randomization and avoid “already known to fail” random values.

An augmentation based deep neural network approach to learn human driving behavior

Overview

chatbotslife.com

2. NEAT

NEAT is a beast and is hard to explain properly in an easy way. You might know the concept already if you’ve seen the Mari/o demo by SethBling. I can recommend to watch this video now, so that you’ve got a better clue on what I’m explaining next.

The basic thing you have to know is that NEAT is an algorithm that observes the performance of neural networks and analyses their behaviours. If the behavioural analysis says it got better, it’s used for breeding. If not, the gene (or genome) gets temporarily deactivated.

NEAT is also typically not used for classical static neural networks and more for ANNs. I call ANNs “Adaptive Neural Networks” because meanwhile pretty much anything is artificial.

ANNs are the idea that you start from scratch with zero neurons and you let the algorithm find the perfect structure of a neural network. The connections of neurons are randomly removed and created. As the behaviour changes we can learn where to spawn neurons more efficiently.

The big difference to a “dropout” concept or a DQN concept is that ANNs will come up with a single perfect solution and not the average “well, it works for me” solution. That’s a big difference in what you can do with them, so when comparing a DQN’s performance with NEAT in games like super mario you will see big differences in how they play the game. A DQN will probably never (read: before we humans die) come up with the same uber-solution as an ANN would.

The basic idea behind NEAT is to track innovations a gene can produce. A gene represents a single connection between neurons.

As NEAT was pretty neat, there are many different implementations in the wild. The most popular ones are HyperNEAT and ES/HyperNEAT. Those are “sick stuff” that are hard to understand. I will skip most details for now, but we are probably going to implement them later in this article series ;)

A general concept that HyperNEAT uses to achieve a better behavioural analysis part is a so-called “Compositional Pattern-Producing Network” or in short CPPN. It’s basically a reinforced neural network that learns the relations of inputs and the measured performance of neural networks. The advantage is that it can memorize the structure of ANNs and relate them to their agent’s fitness. So it can categorize more efficiently and figure itself out “what changes in structure” affected “what changes in performance” in the timeline.

The behavioural analysis is nothing more than the idea to reduce the amount of unnecessary randomization, so that we can automatically guess in a better way which is the “most probable” value for a better guess in future.

Compositional Pattern-Producing Network analyses many neural networks (or agents) and categorizes them in species with similar behaviours.

In a typical NEAT implementation the population pool consists of so called “agents” because it’s a multi-agent concept where many AIs try to solve the same problem with different “ideas”.

Those agents always compete against each other and always try to be the “fittest” one so that their DNA, and therefore their gained knowledge represented in neural network weights, survives.

Multi-Agent system sorts agents by fitness and determines the dominant ones (with better mutations)

The dominant agents in a multi-agent system are the ones that can breed to populate the next evolution cycle. The population pool for the next cycle typically is split up in three different types of agents:

20% Survivors (crossover breeding of fittest agents)
20% Mutants (totally randomized neural networks)
60% Children (crossover breeding between fittest agents and rest of population)

Those are basically the percentages of a healthy evolution scenario. They are not NEAT specific nor evolution specific and more of a rule of thumb of my own experience.

A healthy evolution’s population pool always has at least 32 agents, so that there’s always “enough to choose from” for breeding them in a healthy manner.

A small population pool is not healthy for breeding, as it will only reinforce already known solutions and not give randomization a chance to come up with a better solution.

Machine Learning for Dummies: Part 2

The last article covered an introduction to Neural Networks and Evolutionary AI Concepts, in particular Genetic…

chatbotslife.com

TL;DR

Evolution and Genetic Programming allows rapid progress in finding the right values for Neural Networks. However, performance is dependent on randomization (and its innovations). NEAT and HyperNEAT tackle that with behavioural analysis and fitness measurement of agents in relations to their genomes, genes, and neuron connections.

I created a github repository with all contents of this article series.

What’s Next

Now that we know the basics of a Multi-Agent system and how we can measure the fitness of Agents, we can dig more into Neural Networks in general.

In a typical AI implementation, an Agent has a Brain, and we are going to learn how the Brain looks like.

Machine Learning for Dummies: Part 2

Got comments or questions? Reply below.

Over an out.

Machine Learning for Dummies: Part 1

Code Snippets & Github Included

Top 3 Most Popular Ai Articles:

0. Basics of Neural Networks

INTRODUCTION TO DEEP LEARNING ON SOCIAL NETWORKS: “HOW TO LEARN EVEN MORE ABOUT YOUR PERSONAL LIFE…

POSTED BY CRISTIAN RANDIERI IN TECHNOLOGY NEWS

Finding the genre of a song with Deep Learning — A.I. Odyssey part. 1

A step-by-step guide to make your computer a music expert.

1. Genetic Programming and Evolution

An augmentation based deep neural network approach to learn human driving behavior

Overview

2. NEAT

Machine Learning for Dummies: Part 2

The last article covered an introduction to Neural Networks and Evolutionary AI Concepts, in particular Genetic…

TL;DR

What’s Next

Written by Cookie Engineer