Ridge Regression Fundamentals and Modeling in Python

In this blog post, I will first try to explain the basics of Ridge Regression. Then, we’ll build the model using a dataset with Python. Finally, we’ll evaluate the model by calculating the mean square error. Let’s get started step by step.

Resource: https://medium.com/kaveai/ridge-ve-lasso-regresyonu-temel-matemati%C4%9Fi-ve-python-uygulamas%C4%B1yla-363916e32d8d

What is the Ridge Regression?

The main purpose of Ridge Regression is, to find the coefficients that minimize the sum of error squares by applying a penalty to these coefficients. Also known as Ridge Regression L2. In another source, it is defined as follows.

Ridge regression is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values to be far away from the actual values.

Ridge regression is the regularized form of linear regression.

In Ridge Regression, the model is set up with all variables given. However, it does not remove variables with low relationships from the model, it brings the coefficients of these variables closer to zero.

  • It is resistant to overlearning.
  • It is biased but has a low variance.
  • It is better than the Least Squares method when there are too many parameters.
  • It offers a solution against multidimensionality. The problem here is that the number of variables is greater than the number of observations. It offers a solution against this.
  • It is effective in multiple linear connection problem. The problem here is that there is a high correlation between the independent variables.
  • It is important to find an optimum value for λ. Cross-Validation is used for this.

Lambda → λ

In Ridge Regression, λ plays a critical role. It allows controlling the relative effects of the two terms. So actually λ is the penalty term. Given λ is represented as an alpha parameter in the Ridge Regression function. By changing the alpha value, we control the penalty term. If λ is zero, this gives us the classical regression equation. Consequently, the higher the Alpha values, the greater the penalty. Therefore, the size of the coefficients is reduced.

- It shrinks the parameters. Therefore, it is used to prevent multicollinearity

- It reduces the model complexity by coefficient shrinkage

Ridge Regression Model

Ridge Regression Model is a version of the classical regression equation with a correction function.

Ridge Regression SSE Formula

The left side of the equation expresses the classical regression calculation. On the right, each Beta value is squared. And these values add up. Then the model is standardized by multiplying λ by the setting parameter. We can call this correction.

  • The λ setting parameter is determined by the user.
  • Beta coefficients are calculated from the data set.
  • A set containing specific values for λ is selected. And Cross-Validation test error is calculated for each.
  • The λ, which gives the smallest Cross-Validation, is chosen as the setting parameter.
  • Finally, the model is re-fitted with this λ selected.

There are also 2 different issues that need to be known about Ridge Regression. These are;

  • Ordinary Least Squares (OLS)
  • Bias & Variance Tradeoff

If you want to know more about these, you can review and learn by clicking this link.

Resource: https://medium.com/analytics-vidhya/a-simple-guide-to-bias-variance-trade-off-part-1-2418229c78e0
Resource: https://laptrinhx.com/8-simple-techniques-to-prevent-overfitting-3288224346/

Modeling with Python

Now let’s build a Ridge Regression model on a sample data set. And then let’s calculate the square root of the model’s Mean Squared Error .This will give us the model error.

First, we import the necessary libraries to can make calculations about the model.

Then, as I did in my previous articles, we make general arrangements about the data. Then we divide the data set into two as train and test set.

You can take a look at the top five data if you want.

Then you can examine the dimensions of the data. 263 is the row count, 20 is the column count.

We build the Ridge Regression model. And then we apply the fitting process for the train set. Ridge Regression model takes some parameters. The main purpose of Ridge Regression was to find the coefficients that minimize the sum of error squares by applying a penalty to these coefficients. This setting parameter is determined as alpha in the model. First, we set up the model over this value as 5 before finding the optimum setting parameter.

The coefficients of the established Ridge Regression model can be seen as follows.

The constant of the model can be seen as follows.

Let’s create a random set of alpha values to find the optimum alpha parameter.

Let’s save the set as lambda_values.

Then we build a Ridge model. Also, we create an empty set of coefficients. We create the model by fitting each alpha value in the set of alpha values we created and then add the calculated coefficients to the set of coefficients we created earlier.

We can see how the coefficients change according to the alpha data we have by drawing the graph below.

Let’s remember the first version of the model.

Let’s save the forecast values as y_pred for the train data set without specifying any alpha parameters.

According to the data estimated under these conditions, we calculate the value of the Root Mean Square Error as follows.

As you can see, according to these conditions, we calculated the RMSE value as 289,34.

Let’s import the cross_val_score from the Scikit-Learn library. We will observe how the RMSE value changes after Cross-Validation.

For the train set, we calculate the RMSE value by making 10-fold Cross-Validation as follows.

Now, let’s do the estimation process with the Ridge Regression model for the test set. Then we calculate the RMSE value as follows.

Without specifying any set parameters, we found the error value we calculated for the test set as 356,80.

Now we will Tuning to find the optimum alpha value and standardize the model. I created two different alpha sets in the tuning process. I will use lambda_values1. You can use whatever you want. Results will vary depending on the lambda cluster you use.

We use RidgeCV to build the model tuned with Ridge Regression. You can see many different alpha values below.

We find the optimum alpha value to use in the tuned model as follows.

Now let’s give this alpha parameter to the ridge_tuned tuned model as a parameter. Then, let’s fit the model. Then we estimate with the data in the test set and calculate the error.

As you can see, we calculated the RMSE error value as 356,64 in the tuned model.


First, we examined what is Ridge Regression in this blog post. Then we talked about the features and basics of Ridge Regression. Mathematically, we examined the model of this algorithm. Then, we set up the model without specifying any parameters with Ridge Regression. We calculated the error based on the predicted values. Then we did some operations to find the optimum alpha parameter. Finally, we calculated the error value by setting up the model tuned with the optimum alpha parameter.

BSc Industrial Eng. | Researching #MachineLearning and #ComputerVision | BI Developer https://www.linkedin.com/in/keremkargin/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store