Code a Neural Network from Scratch in Python

Subham Tewari
4 min readSep 19, 2018

In this article, I will be showing you how to code a Neural Network from scratch. Most of us use modern libraries like TensorFlow and Keras to code a Neural Network in a few lines of code. But if you want to have a clear understanding of Neural Networks you must know how to code it from scratch.

I hope you will like this article and it will help aspiring data scientist. So let’s start by understanding what a neural network is.

I got motivation for writing this article from Favio Vázquez.

What is a Neural Network?

Neural Network is based on the belief that human brain makes right decisions by making the right connections which are done through neurons. The artificial neural network comprises of -

  • An input layer x,
  • A number of hidden units,
  • An output layer ŷ,
  • A set of Weights W and biases b,
  • A choice of activation function at each hidden layer σ.

The picture below depicts a 2-layered neural network with a hidden unit inside it.

2-layered Neural Network

So I will start coding a Neural Network from scratch. Let’s get started now.

First, we will define a Neural Network class to start off things.

For a two-layered Neural Network, we have one hidden layer in between. This is the equation from layer 1 is z1 = w1.x +b1. After that, there is a hidden layer where we add an activation function a1= σ(z1) which is the output of the first layer in this neural network a1= σ(w1.x + b1). This will be the input for the second layer Z2 = w2.a1 + b2. After that, we will add another activation function σ. So the final equation will be

The output of the neural network

The image below is a simpler representation of the above paragraph.

The architecture of a 2-layered Neural Network

Coming to the loss function at the end, we will now discuss feed-forward and the loss function. After that, we will back-propagate to fine tune the weights and biases.

At the start, we will be performing random initialization for the weights and biases. So after a forward propagation, we will calculate the loss by subtracting the true value(y) with predicted value(ŷ). After that, we will back-propagate and then fine tune the weights and biases. We fine tune the weights and biases we make sure that the predicted value is closed to the true value.

Forward Propagation

In this part of the code, we will forward propagate through the network. Note that we have assumed biases to be zero.

So we can see that I have defined a feed-forward network layer and we have defined layer-1 and the output layer. Now we will define the Loss Function in our next step.

Loss Function

There are many loss functions that we can use to calculate the loss between the true value and the predicted value but in this example, we have used the cross-entropy loss as our loss function. Instead of Mean Squared Error, we use Log Loss. Cross-entropy loss can be divided into two separate cost functions: one for y=1 and one for y=0.

Loss function for Cross entropy loss
Cross entropy loss

So we can see if y = 1 then log(h(x)) will be large and similarly h(x) will be large. If y = 0 then log(1-h(x)) will be large and so h(x) will be small.

Our goal in training is to find the best set of weights and biases that minimizes the loss function. For this, we will use back-propagation to minimize the loss. We will use gradient descent to tune and update the weights and biases. So let’s see.

Backpropagation

Backpropagation is a method by which we back propagate by calculating gradients and update the weights and the biases so that the calculated loss will be minimum. So let’s go through the code and see what really is happening here.

In order to fine tune the weights and biases, we have to compute the derivative of the loss function with respect to the weights and biases.

Gradient Descent

Gradient descent is a method by which we can find the global minima so as to minimize the loss function. We perform a derivative of the loss function with respect to weights and biases and then update the weights and biases accordingly.

We did 1500 iterations and after that, the loss monotonically decreased towards a minimum.

So this brings us to the end of this article. Hope you have liked my article. If you have any suggestions reach out to me at subham.tiwari186@gmail.com or follow me on Twitter. If you want the whole source code please comment below.

Reference- http://neuralnetworksanddeeplearning.com/

--

--

Subham Tewari

AI Engineer. Deep learning enthusiast and an avid tech follower.