Lasso Regression Fundamentals and Modeling in Python

Kerem Kargın
Analytics Vidhya
Published in
5 min readMay 1, 2021

--

In this blog post, I will first try to explain the basics of Lasso Regression. Then, we’ll build the model using a dataset with Python. Finally, we’ll evaluate the model by calculating the mean square error. Let’s get started step by step.

Resource: https://waterprogramming.wordpress.com/2017/02/22/dealing-with-multicollinearity-a-brief-overview-and-introduction-to-tolerant-methods/

What is the Lasso Regression?

The main purpose in Lasso Regression is to find the coefficients that minimize the error sum of squares by applying a penalty to these coefficients. In another source, it is defined as follows:

The “LASSO” stands for Least Absolute Shrinkage and Selection Operator. Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. Lasso Regression uses L1 regularization technique. It is used when we have more number of features because it automatically performs feature selection.

Features of Lasso Regression

  • Ridge Regression’s all relevant-unrelated variables have been proposed to overcome the disadvantage of leaving the model.
  • Lasso Regression brings the coefficients closer to zero.
  • But when the norm L1 is large enough, it makes some coefficients zero. Thus, the variable makes the selection.
  • It is very important that λ is chosen correctly. Cross-Validation is used for this.
  • Ridge and Lasso methods are not superior to each other.
Resource: https://spotio.com/blog/regression-analysis/

Lasso Regression Model

  • λ denotes the amount of shrinkage.
  • λ = 0 implies all features are considered and it is equivalent to the linear regression where only the residual sum of squares is considered to build a predictive model
  • λ = ∞ implies no feature is considered i.e, as λ closes to infinity it eliminates more and more features
  • The bias increases with an increase in λ
  • Variance increases with a decrease in λ

Modeling with Python

Now let’s build a Lasso Regression model on a sample data set. And then let’s calculate the square root of the model’s Mean Squared Error. This will give us the model error.

First of all, we import the libraries necessary for modeling as usual.

Then we do data reading and some data editing operations.

With Lasso regression, we set up the model on the train set.

I do not go into concept details such as what is fit, what is a train set.

We found the constant of the Lasso regression model to be -5.58 with the following function.

According to the variables in the data set we have, we find the variable coefficients in the Lasso model as follows.

As you know, the coefficients in Lasso regression may vary according to the determined alpha parameter. In the following operations, we determine the alpha according to the different lambda values, then set up the model and calculate the coefficients according to the determined alpha value. Then we add the coefficients to the list named coefs with the append command.

We can see how the coefficients vary against the alpha values we randomly select with the graph below.

Prediction

Now let’s make the model prediction under normal conditions without specifying any parameters. We can see the first 5 observations of the model prediction for the train set as follows.

Likewise, we can see the first 5 observations of the model prediction for the test set as follows.

Then we saved the values we predicted over the test set in a cluster named y_pred. And we found the RMSE value as 356,09 as a result of the calculation below.

As a result, we found the R-squared score to be 0,41. The R-squared score is the percentage of the change in the dependent variable explained by the independent variables.

In other words, we can say that the independent variables in the Lasso Regression Model explain 41,42% of the change in the dependent variables for this data set.

What is R-Squared?

R-squared (R2) is a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. Whereas correlation explains the strength of the relationship between an independent and dependent variable, R-squared explains to what extent the variance of one variable explains the variance of the second variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model’s inputs.

Model Tuning

In this section, we will do the operations using the LassoCV method to find the optimum lambda value.

We will use LassoCV when doing Lasso Regresyonde Tuning. We give the alpha data set from the parameters as np.random.randint (0,1000,100). We set the number of Cross-Validation as 10. As you increase the number of CVs, your result will change as you will make more combinations. But doing more is not always good. The number of CVs will increase the error value from a certain point.

We found the alpha value of the Lasso model established with Cross-Validation as 169.

Then we setup the Corrected Lasso model with this optimum alpha value. Then we print the predicted values over the test set to y_pred. As a result, we find the RMSE value as 362,4.

We know that coefficients that are not used in Lasso Regression and do not matter are equalize to zero. You can observe this as follows.

Finally

First, we examined what is Lasso Regression in this blog post. Then we talked about the features and basics of Lasso Regression. Mathematically, we examined the model of this algorithm. Then we set up the model according to the current conditions and calculate the error value. In the Model Tuning part, we calculated the tuned error value by calculating the optimum alpha value with LassoCV and rebuilding the corrected model according to this alpha value. The basic logic in Lasso is to set the unused coefficients equal to zero. As a result, we observed this.

Resources

  1. https://www.mygreatlearning.com/blog/understanding-of-lasso-regression/
  2. https://dataaspirant.com/lasso-regression/
  3. https://www.investopedia.com/terms/r/r-squared.asp

--

--

Kerem Kargın
Analytics Vidhya

BSc. Industrial Eng. | BI Developer & Machine Learning Practitioner | #BusinessIntelligence #MachineLearning #DataScience