Machine learning frameworks and libraries are improving and are becoming more practical. There seems to be an effort to put the newest machine learning algorithms in the hands of software developers to put them into practical use.
Let us first have a look at why machine learning (ML) is so interesting for software developers. I think ML makes things possible that were previously impossible. This is something your shiny new functional programming language or framework will not do for you. Machine learning can solve problems that a procedural programming language can never solve for you. For example, let’s suppose I want to classify whether an animal is a cat or a dog. Can you give the business rules that separates a cat from a dog? Think about it for a minute or so. They both have four legs, they have hair, they have two eyes. Can you give a rule that is always true for a cat and always false for a dog? But still when I show you a cat or dog you immediately know which one it is.
In the real world, it is hard to let a procedural program make this kind of decisions. As software engineers, we devise models of the real world which capture the information. We let humans make the conversion from the analog to digital world. When you make an appointment with your veterinarian you tell the vet what kind of animal you have and the vet will fill in the form, with the correct animal, in the appointment system. This how information is transformed from the analog world to the digital world. This is how traditional software is made.
With machine learning, this has changed. A machine can tell from a photograph if your pet is a cat or dog. As developers, we have learned to avoid solving these kinds of problems. When a business expert cannot explain to us how he or she makes a decision (in other words the business rules) we cannot make software for it. But when applying machine learning we can. This opens up new opportunities.
So how does this work? How can a machine learn do the right thing without explicitly telling it what to do? The answer, give it examples of previous cases with the right labels attached. In machine learning, this is known as supervised learning. In our case, give it lots of examples of photos of cats and dogs with the correct label attached. The machine will generalize from the examples and when an unlabeled example is entered give back the correct label.
Intuition on neural networks
Below I want to give some intuition on what neural networks do. It is by no means a complete explanation (for that I can recommend the Machine Learning course at Coursera or the website neuralnetworksanddeeplearning.com).
In order for a machine to learn from examples, it needs to be able to relate these examples. At http://playground.tensorflow.org they make a great visualization of this.
What you see here is a dataset for a binary classification problem. The classes are represented by the colors yellow and blue. This dataset is two-dimensional. Each data point has two features, an x and a y coordinate. To go back to our example of classifying cats and dogs, the x feature could be the height of the animal and the y feature could be the weight of the animal (probably not very useful features to distinguish cats from dogs though). Based on those features the neural network will learn the classes (yellow or blue) and later for an unknown data point predict the class. All the features of a single data point can also be called a vector. A vector is an array of all the features of a single data point.
When you let the neural network run you see that the predictions of the neural network change, this is represented by the colors on the background.
The neural network deduces the dominant class at each position in the feature space. This is done by looking at the classes of the surrounding data points. If you would give the neural network an unlabeled data point that would end up in the top left corner of the feature space it would predict the yellow class.
Usually, your dataset will have more than two dimensions/features. But the same principles will apply, it’s just a lot harder to visualize 🙂
In this case, the neural network can easily generalize and separate the blue from the yellow data points. Hopefully, you can see that feeding it a data set where the blue and yellow classes are spread randomly across the feature space it would be hard for it to generalize.
It’s important that the features you select for your dataset can be used to generalize on. To go back to our previous example of cats and dogs. If your feature is the count of the number of paws for the animal, no neural network can figure out for a new example if it’s a cat or a dog.
One very simple technique you can try is given a datapoint of a known class see if datapoints of that same class are nearby. Furthermore, samples of other classes should be further away. This is not a definitive rule, there could be multiple clusters of the same class.
So what does a neural network do? It uses examples (encoded in vectors) with labels attached (in this case two colors) and generalizes on them so that it can make a prediction for an unlabelled data point. In the next article, I want to introduce an algorithm that can vectorize text documents. I will also make a classifier using these text vectors as input for a neural network. The classifier will be trained on the Reuters 21578 news article dataset.