

# Spliting target variable and independent variables Housing = pd.read_csv(".data/housing.csv")įrom sklearn.model_selection import train_test_split Standard Error = SQRT(Unexplained variation / (n-(k+2))įrom ANOVA, Unexplained variation = 11473.14919 and n-(k+2) = 492įrom sklearn.linear_model import LinearRegression Standard Error – is the standard deviation of the observed y-values about the predicted 𝑦 -value for a given x-value N is number of observations (n = 506), and k is number of independent variables used in the model (k = 13). It only increases if the new predictor enhances the model.Īdjusted R-square = 1 – ( (n – (k + 1)) / (n – (k + 2) ) * (1 – R-square) Total variation is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y. From ANOVA table, Explained variation = 31243.14662 and Total variation = 42716.29542Īdjusted R-square – when you penalize R-square for every new variable added to the model. R-square = Explained variation / Total variationĮxplained variation is the sum of the squared of the differences between each predicted y-value and the mean of y. It is also known as the “coefficient of determination”. R-Square – tells how close the data are to the fitted regression line. It varies between +1 to -1, and equal to the square root of R square. It tells the strength of the linear relationship. Multiple R – also known as the correlation coefficient. “Regression Statistics”, tells how well the model captures the relationship between independent variables and the target variable. Now will visit each section in the regression analysis to deeper our understanding. Press “OK” and you have done the regression analysis. ZN – the proportion of residential land zoned for lots over 25,000 sq.ft.MEDV – Median value of owner-occupied homes in $1000’s.LSTAT – % lower status of the population.B – 1000(Bk – 0.63)^2 where Bk is the proportion of blacks by town.

TAX – full-value property-tax rate per $10,000.RAD – index of accessibility to radial highways.DIS – weighted distances to five Boston employment centres.AGE – the proportion of owner-occupied units built prior to 1940.RM – the average number of rooms per dwelling.NOX – nitric oxides concentration (parts per 10 million).CHAS – Charles River dummy variable (1 if the tract bounds river otherwise 0).INDUS – the proportion of non-retail business acres per town.Stay here for a moment to understand the logical relationship with the median value or MEDV. Our goal is to predict the median value of homes using the independent variables.
