Table of Contents

**Introduction To Linear Regression Algorithm in Machine Learning**

In machine learning, The Linear Regression Algorithm in Machine Learning is a supervised learning technique to approximate the mapping function to get the best predictions. Supervised learning is a technique in which we teach or train the machine using data that is well labeled which is the input and output variables using an algorithm to predict the outcome.

**What is Regression?**

Regression is one of the Supervised learning category where the result is predicted by the use of known parameters that are correlated with the output. Regression is used to predict values within a continuous range value rather than trying to classify them into different categories. The known parameters are used to make a continuous and constant slope which is used to predict the unknown or the result. It tries to draw the best fit line from the data gathered from a number of points, It is also used to predict the relationship between a dependent variable and independent variables

For example, if you want to predict the speed of a car given the distance, predict the salary of the person according to the years of experience, it is a Regression problem.

**Linear Regression Algorithm in Machine Learning**

Least Square “Linear Regression” is a statistical method to regress the data with a dependent variable having continuous values whereas independent variables can have either continuous or categorical values. In other words “Linear Regression” is a method to predict the dependent variable (Y) based on values of independent variables (X). It can be used for the cases where we want to predict some continuous quantity.

**Statistical relationship is not Deterministic relationship **

The relationship between the two variables is said to be deterministic if one variable can be accurately expressed by the other. Eg: Degree Celsius and Fahrenheit.

The statistical relationship is not accurate in determining the relationship between two variables. Eg: Relationship between height and weight.

The core idea is to obtain a line that best fits the data. The best fit line is the one for which total prediction error (all data points) are as small as possible. Error is the distance between the point of the regression line.

Simple linear regression is an approach for predicting a quantitative response using a single feature (or “predictor” or “input variable”). It takes the following form called a **Linear Regression Equation**:

**Y = β0 + β1 * x**

- Y is the response
- x is the feature
- β0 is the intercept
- β1 is the Multiplier / Unit change Together β0 and β1 are called the
**model coefficients**

**Exploration of model coefficient**

**β0 is the intercept**:- If the model does not include x=0, then the prediction will become meaningless with only **β0**. For example, we have a dataset that relates years of experience(x) and Salary(y). Taking x=0 (that is Experience as 0), will make the equation have only b0 value which is completely meaningless as in real-time Experience and Salary can never be zero. This resulted due to considering the model values beyond its scope.

If the linear regression model includes value 0, then ‘b0’ will be the average of all predicted values when x=0. But, setting zero for all the predictor variables is often impossible.

The value of **β0** guarantees that the residual will have mean zero. If there is no **β0** term, then the regression will be forced to pass over the origin of the graph. Both the model coefficient and prediction will be biased.

**β1 is the Multiplier / Unit change** **in x**:-

If the value of β1 is greater than zero (β1 > 0) then the independent and dependant variable has positive relation it means an increase in x will increase y.

If the value of β1 is less than zero (β1 < 0) then the independent and dependant variable has negative relation it means an increase in x will decrease y.

**How to calculate the model coefficient**

The following are the formulas to calculate the linear model coefficient.

**β1 = ∑ (X – **x̅ **)( Y – Ӯ ) / ∑ ( X – **x̅ **)**^{2 }^{ }

**βo = Ӯ – β1 **x̅

x̅ – mean of X

Ӯ – mean of Y

**Estimating(Learning) Model Coefficients**: Generally speaking, coefficients are estimated using the least squares criterion, which means we find the line (mathematically) which minimizes the sum of squared residuals (or “sum of squared errors”)

**Linear Regression Line**: While doing linear regression our objective is to fit a line through the distribution which is nearest to most of the points. Hence reducing the distance (error term) of data points from the fitted line.

**Evaluation of Algorithm**

**MAE** – This is mean absolute error. It is robust against the effect of outliers. Using the previous example, the resultant MAE would be (30-10) = 20

**MSE **– This is mean squared error. It tends to amplify the impact of outliers on the model’s accuracy. For example, suppose the actual y is 10 and predictive y is 30, the resultant MSE would be (30-10)² = 400.

**RMSE** – This is the root mean square error. It is interpreted as how far on average, the residuals are from zero. It nullifies the squared effect of MSE by square root and provides the result in original units as data. Here, the resultant RMSE would be √(30-10)² = 20. Don’t get baffled when you see the same value of MAE and RMSE. Usually, we calculate these numbers after summing overall values (actual – predicted) from the data

**Coefficient of Determination (R Square):-**

It suggests the proportion of variation in Y which can be explained with the independent variables.

Mathematically: **R2 = SSR/SST **

**or R2 = Explained variation / Total variation**

It explains the proportion of variation in the dependent variable that is explained by the independent variables.

**Explained variation** is the sum of the squared of the differences between each predicted y-value and the mean of y.

**Explained variation = Σ(ŷ − ȳ)**^{2}

**Unexplained variation** is the sum of the squared of the differences between the y value of each ordered pair and each corresponding predicted y-value.

**Unexplained variation = Σ(y – ŷ)**^{2}

**Total variation** about a regression line is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y.

**Total variation = Σ(y − ȳ)**^{2}

Range of R Square from 0 to 1, R Square of 0 means that the dependent variable cannot be predicted from the independent variable, R2 of 1 means the dependent variable can be predicted without error from the independent variable, If the value of R Square is 0.912 then this suggests that 91.2% of the variation in Y can be explained with the help of given explanatory variables in that model. In other words, **it explains the proportion of variation in the dependent variable that is explained by the independent variables.**

**IMPLEMENTATION OF LINEAR REGRESSION USING SKEARN**

We used the Advertising dataset to test the assumption of linear regression. Link of data set. In this dataset contain a TV, Radio, Newspaper Advertising investment, and according to their sale.

```
Step 1:- Import Required Library
# Import Required Library
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # For Visualization
import matplotlib.pyplot as plt
sns.set(context="notebook", palette="Spectral", style = 'darkgrid' ,font_scale = 1.5, color_codes=True)
import warnings
warnings.filterwarnings('ignore')
Step 2:- Load DataSet
# Load DataSet
ad_data = pd.read_csv('Advertising.csv',index_col='Unnamed: 0')
ad_data.head() # Show Top five columns
```

TV | Radio | Newspaper | Sales | |

1 | 230.1 | 37.8 | 69.2 | 22.1 |

2 | 44.5 | 39.3 | 45.1 | 10.4 |

3 | 17.2 | 45.9 | 69.3 | 9.3 |

4 | 151.5 | 41.3 | 58.5 | 18.5 |

5 | 180.8 | 10.8 | 58.4 | 12.9 |

```
Step 3:- # Graphical Visulalization of data
sns.pairplot(ad_data)
plt.show()
```

```
Step 4:- Take Independant and Dependant Variable
x = ad_data.drop(["Sales"],axis=1).values
y = ad_data.Sales.values
Step 5:-Split the dataset in to Traing And Testing
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y,random_state = 0,test_size=0.25)
print('Shape of Trainig Data ',X_train.shape)
print('Shape of Testing Data ',X_test.shape)
Output:-
Shape of Trainig Data (150, 3)
Shape of Testing Data (50, 3)
Step 6:- # Import Linear Regression from sklearn library
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train,y_train) # Fit Training data into into model
Step 7:- Prediction
Y_Predict = model.predict(X_test) # Predict the test data
Step 8:- # Evaluation of model
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
# To Chcek the evaluation we pass actual value and Predicted Value in to function
print('Mean Absolute Error value:-',mean_absolute_error(y_test,Y_Predict))
print('Mean Square Error value:-',mean_squared_error(y_test,Y_Predict))
print('Root Mean Square Error value:-',np.array(mean_squared_error(y_test,Y_Predict)))
Output:-
Mean Absolute Error value:- 1.300032091923545
Mean Square Error value:- 4.0124975229171
Root Mean Square Error value:- 4.0124975229171
print("R squared: {}".format(r2_score(y_true=y_test,y_pred=Y_Predict)))
Output:-
R squared: 0.8576396745320893
```

**Conclusion**:- In this blog, you will get a better understanding of the regression part of supervised machine learning and working of linear regression and their implementation.