The linear regression algorithm is a statistical algorithm that finds relationships between various independent and dependent variables. With the trend captured via linear regression, you can predict future values for one or more continuous variables.

Several libraries such as Scikit-learn, and TensorFlow allow you to implement the linear regression algorithm out of the box, however, to better understand the inner working of the linear regression algorithm, we will implement it from scratch in Python.  To perform linear algebra operations for linear regression, you will be using the NumPy library.

Table of Contents

  1. The Training Dataset
  2. The Forward Pass
  3. Loss Calculation
  4. Gradient Calculation
  5. Weight Update
  6. Training the Complete Model

Linear Regression from Scratch in Python

The Training Dataset

The dataset that you will be using to implement your linear regression algorithm can be generated using the make_regression() method from the datasets module of the sklearn library in Python.

The following script creates a dataset with two independent features and 1 dependent feature. The independent features are stored in the form of a two-dimensional numpy array named “X” while the dependent feature (also known as labels) are stored in the “y” array. The shape of the “X” array is then printed on the console.

from sklearn.datasets import make_regression
from matplotlib import pyplot

X, y = make_regression(n_samples=100, n_features=2, noise=10)


(100, 2)

Next, you will convert the y array into a vector array containing 100 rows and 1 column as shown below:

y = y.reshape(len(y), 1)



(100, 1)

The Forward Pass

The first step in implementing a linear regression algorithm is the forward pass.  In the forward pass, you simply multiply the feature set with some weights and then add some bias to it.

Mathematically, you can write the forward pass as follows:

y_pred = wx + b

In the above equation,  y_pred are the predicted feature values, X is the feature set, w is the weights parameter and b is the bias. The w and b are the parameter values that define the relationship between the feature set x, and the label set y in a linear regression algorithm.

The following script creates the weight and bias vectors w and b, respectively.

import numpy as np
w = np.ones(shape=(X.shape[1], 1))
b = 1


(2, 1)

Once you have defined your weight parameters, you can create a function that performs the forward pass by taking the dot product of the feature vector with the weights parameter and then adding the bias vector to it.

def forward(X, w, b):
   y_pred =, w) + b
   return y_pred

Loss Calculation

The next step is to calculate the loss to see how well our linear regression algorithm is performing using the default weight and bias values. You can use any loss function. A common loss function is the mean squared error function which is implemented in the following script:

def find_loss(y, y_pred):
    loss = np.mean(np.square(y - y_pred))
    return loss


Gradient Calculation

The idea in linear regression is to find the values of weights and bias that return the minimum value for the loss. In other words, you need to find the minima of the loss function with respect to weights and bias. One way to do so is to find if the loss is increasing or decreasing with respect to the increase in the current values of weights and bias. The derivative or the gradient of the loss function with respect to weights and bias tells us if the loss is increasing or decreasing. Derivative of the loss function (mean squared error function in our case) with respect to the weights and the bias can be calculated using the following script:

def find_gradient(X, y, y_pred):
    num_items = X.shape[0]
    dw = (-2/num_items)*(, (y-y_pred)))
    db = (-2/num_items)*(np.sum(y-y_pred))
    return dw, db


Weight Update

Finally, you can update the existing weight values with the derivative values of weights and bias by subtracting a fraction  (called learning rate) of derivative values by the original weights and bias values.

def optimize(w, dw, b, db, lr):

w = w – lr*dw
b = b – lr*b

return w, b

Training the Complete Model

You can repeat the forward pass, loss calculation, gradient calculation, and weight update steps until the loss is minimized. To do so, you can repeat these steps on your dataset multiple times (also called epochs). This process is known as training the algorithm. The following script defines a function called “fit()” which trains your linear regression algorithm.  The loss after each training cycle is stored in the “loss_history” variable.

loss_history = []
def fit(X,y, w,b, lr, num_iter):

    for i in range(num_iter):

        # forward propagation
        y_pred = forward(X, w, y)
        loss = find_loss(y, y_pred)

        #backward propagation
        dw, db = find_gradient(X, y, y_pred)

        w, b = optimize(w, dw, b, db, lr)

Let’s call the fit method now. You will train the linear regression model 200 times with a learning rate of 0.01.

num_iter = 200
lr = 0.01
fit(X,y, w,b, lr, num_iter)

Finally, you can plot your “loss_history” list to see if the loss is minimized and your linear regression algorithm is trained or not.  Run the following script to plot your loss against epochs.

pyplot.title('Epochs vs Loss')

linear regression loss

From the above plot, you can see that after around 100 training iterations (epochs), we get the optimal values of weights and bias that return minimum loss for our linear regression algorithm.