简体   繁体   中英

How do i interpret this plot and summary (multivariable linear regression)

I am not 100% sure how to interpret the plot for multivariable linear regression, especially everything besides the normal QQ one.

From my understanding, the plot showed linearity or the model is a good fit.

多元线性回归

As for the summary, I think it showed some pretty good results based on R^2 and adjusted r squared alongside F-statistic and T/p-value.

回归模型总结

The Plots

First, your plots...

在此处输入图像描述

The first plot (top left) is your residuals vs fitted plot shows your fitted values (what the regression predicts that your value should be) and your residual values (how badly it predicted). They should be fairly evenly distributed around the center line, or else this may hint that there is issues with equality of variance or curvilinearity. From the looks of your plot, it looks like your data is fairly smooshed into the left side, hinting that your data is not evenly spread out on a scatter plot.

The second plot (top right), your scale-location plot , is slightly different, as the y axis now uses standardized residuals. Since these are standardized, they allow one to see if the distance in residuals changes based on location. The red line should be as horizontal as possible again should have values that are as evenly distributed as possible. Your plot seems to indicate that again this isn't the case.

The third plot (bottom left), the QQ plot , tests to see if your residuals are normally distributed by plotting theoretical quantiles by standardized residuals. The plotted points should mostly resemble a straight line, with only minor curvature at the ends. It's hard to tell with certainty since the plots are kinda squished together into one window. However, it looks like the residuals appear mostly normal with slight curves on the left (not an issue) and some heavy curves on the right (heavy tails may indicate issues with variation on the right side of your scatterplot). To see if this is really damning, run a density plot on your raw residuals and see if they look normal.

The last plot (bottom right), your residuals vs leverage plot , checks the leverage of points in your regression as potential outliers. There are different numbers people have suggested for what is considered "too high", (greater than 1, 4/n, etc.). It's best to simply check if some points look way too far away from the others and see if they are causing problematic trends.

By the way, the numbers shown on the points in these plots show you where they are indexed, so you can check them directly. For example, the top most point in the first plot is located in Row 49.

For comparison, here are some residuals from Karl Pearson's original father-son height data, which has fairly normal diagnostic plots. Notice that the order is slightly different, but interpretation is the same:

在此处输入图像描述

The Summary

The first part is the formula call, which just specifies how you modeled the regression. The second part shows how your residuals are distributed. Think of the minimum as the point that strays the furthest below your regression line, the max as the furthest above. Your median should be as close to zero as possible, but so long as it's not some weird number this can be anything so long as it is fairly low.

The coefficients show your intercept and each of your predictors. Listed in order next to them are 1) the slope, which gives the number to be multiplied by their raw values to complete a linear regression equation 2) the standard error, which is how accurate this association is 3) the t value, which is used to test significance, and 4) the p value, which is used as your significance "flag". All of your slope coefficients are significant, though not knowing what these predictors mean makes it difficult to interpret them with confidence.

Below are some model metrics you seem to already know about. Remember that when you have multiple predictors, the adjusted R square should be taken with more weight because it penalizes your regression for overfitting with too many predictors, whereas the normal R square will always increase with more predictors. The f statistic and the values with it are used to test if the model as a whole is significant and the residual standard error is an approximation of how accurate the model is in general.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM