Credits to Jean-Nicholas Hould on his post that gives an intuitive approach to learn a basic Machine Learning algorithm and Sebastian Raschka book on Machine Learning in Python.
Machine Learning (ML) is playing a key role in a wide range of critical applications, such as Computer Vision, Data Mining, Natural Language Processing, Speech Recognition and others. ML provides potential solutions in all of the above mentioned domains and more, it’s surely going to be the the driving force of our future digital civilization.
ML can be a bit intimidating for a newcomer. The concept of ML might be quite abstract and the newcomer might be bombarding himself with multiple questions. One big question being, “How does it work?”.
In order to explain this, I decided to write a Binary Classifier from scratch. I will not be making use of Scikit-learn in this post. The imperative of this post is to understand the core working principle of an ML algorithm.
Let’s consider a scenario where you are told to seperate a basket full of Apples and Oranges into two seperate baskets.
Afer you find the difference between the two, then you’ll seperate them.
Now, let’s explain the Binary Classifier from the above scenario.
A Classifier in Machine Learning is an algorithm, that will determine the class to which the input data belongs to based on a set of features.
A Binary Classifier is an instance of Supervised Learning. In Supervised Learning we have a set of input data and a set of labels, our task is to map each data with a label. A Binary Classifier classifies elements into two groups, either Zero or One.
As Machine Learning algorithms learn from the data, we are obliged to feed them the right kind of data. So, the step towards achieving that is via Data Preprocessing.
Data Preprocessing is a data mining technique that involves transforming the raw data into an understandable format. Real-world data is often incomplete, noisy, inconsistent or unreliable and above all it might be unstructured.
In simple terms, Data Preprocessing implies grooming the raw data according to your requirement using certain techniques.
Once you have your dataset after preprocessing, then it’s time to select a learning algorithm to perform your desired task. In our case it’s Binary Classifier or a Perceptron.
The metrics that you choose to evaluate the machine learning algorithm are very important. The choice of metrics influences how the performance of machine learning is measured and compared.
A Perceptron is an algorithm for learning a binary classifier: a function that maps it’s input x to an output value f(x)
The value of f(x) is either 0 or 1, which is used to classify x as either a positive or a negative instance.
Let’s implement the perceptron to predict the outcome of an OR gate.
Let’s initialize an array with initial weights equal to 0. The length of the array is equal to number of features + 1. The additional feature is the “threshold”.
self.weight_matrix = np.zeros(1 + X.shape[1])
The loop “iterates” multiple times over the training data to optimize the weights of the dataset.
for _ in range(number_of_iterations)
We loop over each training data point and it’s target. The target is the desired output which we want the algorithm to predict. As it’s a binary classifier, the targeted ouput is either a 0 or 1.
The prediction calculation is a matrix multiplication of the features with the appropirate weights. To this multiplication we add the “threshold” value.
If the resulting value is above 0, then the predicted category is 1
.
If the resulting value is below 0, the the predicted category is 0
.
At each iteration, if the prediction is not accurate, the algorithm will adjust the weights. The adjustment of the weights will be done proportionally to the difference between the target and predicted value.
The difference is then mulitplied by the learning rate (rate)
. Higher the value of rate
, larger the correction of weights. The algorithm will stop to adjust the weights when the predicted value becomes accurate.
self.weight_matrix = np.zeros(1 + X.shape[1])
# Iterating multiple times to optimize the weights.
for _ in range(number_of_iterations):
for xi, target in zip(X, y):
update = self.rate * (target - self.predict(xi))
self.weight_matrix[1:] += update * xi
self.weight_matrix[0] += update
def dot_product(self, X):
""" Calculate the dot product """
return np.dot(X, self.weight_matrix[1:]) + self.weight_matrix[0]
def predict(self, X):
""" Predicting the label for the input data """
return np.where(dot_product(X) >= 0.0, 1, 0)
You could also try to change the training dataset in order to model an AND, NOR or NOT. Note that it’s impossible to to model XOR function using a single perceptron like the one we implemented, because the two labels (0 or 1) of an XOR function are not lineraly seperable.
Here’s the entire code:
Written on January 21st , 2017 by Mahesh Kumar K