简体   繁体   English

如何解释相关系数

[英]How to interpret correlation coefficient

I am trying to find the correlation coefficient in R between my dependent and independent variable. 我试图在我的因变量和自变量之间找到R中的相关系数。

data("mtcars")
my_data <- mtcars[, c(1,3,4,5,6,7)]
res <- cor(my_data)
round(res, 2)

As a result, I got a correlation matrix, some with +ve or -ve. 结果,我得到了一个相关矩阵,其中一些带有+ ve或-ve。

For ex: if correlation coefficient between mpg and disp is -0.85, how can I know which variable is decreasing and the one increasing? 例如:如果mpg和disp之间的相关系数为-0.85,我怎么知道哪个变量在减少而一个在增加?

Another way to think about this is that a correlation coefficient of -0.85 tells you that a one standard deviation increase(decrease) in either variable is associated with an 0.85 standard deviation decrease(increase) in the other variable. 考虑这一点的另一种方法是,相关系数为-0.85告诉您,每个变量的一个标准偏差增加(减少)与另一个变量的0.85标准偏差减少(增加)相关。 You can see this graphically using the code below. 您可以使用以下代码以图形方式查看此内容。

The black line is the regression line for a regression of disp vs. mpg . 黑线是dispmpg回归的回归线。 This is related to the correlation coefficient because the regression slope equals the correlation coefficient times the standard deviation of disp divided by the standard deviation of mpg . 这与相关系数有关,因为回归斜率等于相关系数乘以disp的标准偏差除以mpg的标准偏差。 (If we had switched the x and y variables and done lm(mpg ~ disp, data=mtcars) , then the regression slope would be the correlation coefficient times the standard deviation of mpg divided by the standard deviation of disp .) (如果我们切换了x和y变量并执行lm(mpg ~ disp, data=mtcars) ,则回归斜率将是相关系数乘以mpg的标准偏差除以disp的标准偏差。)

plot(mtcars$mpg, mtcars$disp)
abline(lm(disp ~ mpg, data=mtcars))
abline(v=mean(mtcars$mpg) + c(0, sd(mtcars$mpg)), col="red", lty="11")
abline(h=mean(mtcars$disp) + c(0, cor(mtcars$mpg, mtcars$disp)*sd(mtcars$disp)), col="red", lty="11")

在此处输入图片说明

You can standardize both variables (that is, scale the values so that they are in units of standard deviations away from the mean), which might make the relationship more clear. 您可以标准化两个变量(即缩放值,以便它们以远离均值的标准偏差为单位),这可能会使关系更清晰。 Now the correlation coefficient and the regression slope are exactly the same because both variables have been scaled to be in the same units. 现在,相关系数和回归斜率完全相同,因为两个变量都已缩放为相同单位。 Note that a 1 standard deviation change in mpgS is associated with a -0.85 standard deviation change in dispS : 请注意, mpgS中的1标准偏差变化与mpgS中的-0.85标准偏差变化dispS

# Standardized versions of mpg and disp
mtcars$mpgS = (mtcars$mpg - mean(mtcars$mpg))/sd(mtcars$mpg)
mtcars$dispS = (mtcars$disp - mean(mtcars$disp))/sd(mtcars$disp)

plot(mtcars$mpgS, mtcars$dispS)
abline(lm(dispS ~ mpgS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")

在此处输入图片说明

You can also reverse the roles of mpg and disp in the graph and the result is equivalent: 您还可以反转图中的mpgdisp角色,结果是等效的:

plot(mtcars$dispS, mtcars$mpgS)
abline(lm(mpgS ~ dispS, data=mtcars))
abline(v=c(0,1), col="red", lty="11")
abline(h=c(0, cor(mtcars$mpg, mtcars$disp)), col="red", lty="11")

在此处输入图片说明

Bear in mind that the relationship implied by the correlation coefficient is based on the assumption of a linear relationship, as embodied by the regression lines in the graphs. 请记住,相关系数所隐含的关系是基于线性关系的假设,如图中的回归线所体现。 If the relationship in the actual data is not linear (as appears to be the case here), the correlation coefficient (or, equivalently, a single variable regression) might not provide good predictions of the values of the independent variable. 如果实际数据中的关系不是线性的(如此处所示),则相关系数(或等效地,单个变量回归)可能无法很好地预测自变量的值。

Consider the following script, which just compares mpg and disp : 考虑以下脚本,该脚本仅比较mpgdisp

res1 <- cor(mtcars$mpg,  mtcars$disp)
res2 <- cor(mtcars$disp, mtcars$mpg)
round(res1, 2)
round(res2, 2)

The output from both calls is -0.85 . 两个调用的输出为-0.85 In other words, the nature of the correlation coefficient is not about the order of one variable against the other. 换句话说,相关系数的性质与一个变量相对于另一个变量的顺序无关。 Rather, a negative correlation coefficient means that as mpg increases, disp tends to decrease. 相反,负相关系数意味着随着mpg增加, disp趋于减少。 And we could also phrase this by saying that as disp increases, mpg tends to decrease. 而且我们还可以说,由于这句话disp增加, mpg趋于下降。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM