简体   繁体   English

系列相关系数计算

[英]Series of Correlation coefficient calculation

I want to analyse the default data set in R (mtcars data set). 我想分析R中的默认数据集(mtcars数据集)。 I am interested in creating column of correlation coefficients according to the below rule. 我有兴趣根据以下规则创建相关系数列。 Correlation coefficient of only first three observations ((ie, row 1,2,3)) between "mpg" and "wt", then leaving the first row, calculate again correlation coefficient between next three observations (ie, row 2,3,4) between mpg and wt then leaving the first two rows, calculate again correlation coefficient between next three observations (ie, row 3,4,5) between mpg and wt and so on till end. 仅“ mpg”和“ wt”之间的前三个观察值(即第1,2,3行)的相关系数,然后离开第一行,再次计算下三个观察值(即第2,3行, 4)在mpg和wt之间,然后离开前两行,再次计算mpg和wt之间的下三个观测值(即第3、4、5行)之间的相关系数,依此类推直至结束。 For example 例如

cor(mtcars$mpg[c(1,2,3)],mtcars$wt[c(1,2,3)])
cor(mtcars$mpg[c(2,3,4)],mtcars$wt[c(2,3,4)])
cor(mtcars$mpg[c(3,4,5)],mtcars$wt[c(3,4,5)]);

and so on. 等等。 Can anyone help to how to automate this R code using loop etc. 任何人都可以帮助如何使用循环等自动执行此R代码。

Example , see how i need output, i have done it in excel but i need to do it in R. 示例 ,看看我如何需要输出,我已经在excel中完成了,但是我需要在R中完成了。

The value of cor(mtcars$mpg[c(1,2,3)],mtcars$wt[c(1,2,3)]) is -0.8884586; cor(mtcars$mpg[c(1,2,3)],mtcars$wt[c(1,2,3)])值为-0.8884586; however, the first value in the Correlation column of the output image in the question is not that so there is some error in the image shown relative to the description of what is wanted. 但是,问题中输出图像的“相关性”列中的第一个值不是那个值,因此相对于所需内容的描述,所示图像中存在一些错误。 We will assume that the description is correct and the sample output is not. 我们将假定描述正确,而样本输出不正确。

Try a rolling apply, rollapply . 尝试滚动应用, rollapply It applies the function cor2 to a rolling window of width 3. align = "left" means it uses the current row and the next 2 rows so that the NA values appear at the end as in the image in the question. 它将函数cor2应用于宽度为3的滚动窗口cor2 align = "left"表示它使用当前行和接下来的2行,以便NA值出现在问题图像中的末尾。 fill = NA causes it to generate NA values for the last 2 elements since there are not 3 more elements for those. fill = NA导致它为最后2个元素生成NA值,因为没有3个元素。

library(zoo)

mtcars2 <- mtcars[c("mpg", "wt")]
cor2 <- function(x) cor(x[, 1], x[, 2])
transform(mtcars2, cor = rollapply(mtcars2, 3, cor2, by.column = FALSE,  
   align = "left", fill = NA))

giving: 赠送:

                     mpg    wt         cor
Mazda RX4           21.0 2.620 -0.88845855
Mazda RX4 Wag       21.0 2.875 -0.82589964
Datsun 710          22.8 2.320 -0.87097656
Hornet 4 Drive      21.4 3.215 -0.99520846
Hornet Sportabout   18.7 3.440 -0.99985063
Valiant             18.1 3.460 -0.99534538
Duster 360          14.3 3.570 -0.97267882
Merc 240D           24.4 3.190 -0.90784130
Merc 230            22.8 3.150 -0.96247218
Merc 280            19.2 3.440 -0.86602540
Merc 280C           17.8 3.440 -0.99308187
Merc 450SE          16.4 4.070 -0.05428913
Merc 450SL          17.3 3.730 -0.96311366
Merc 450SLC         15.2 3.780 -0.99534934
Cadillac Fleetwood  10.4 5.250  0.05301502
Lincoln Continental 10.4 5.424 -0.98658763
Chrysler Imperial   14.7 5.345 -0.96899291
Fiat 128            32.4 2.200  0.44730718
Honda Civic         30.4 1.615 -0.86317499
Toyota Corolla      33.9 1.835 -0.94182141
Toyota Corona       21.5 2.465 -0.99341821
Dodge Challenger    15.5 3.520 -0.94720046
AMC Javelin         15.2 3.435  0.21168794
Camaro Z28          13.3 3.840 -0.90670560
Pontiac Firebird    19.2 3.845 -0.99864434
Fiat X1-9           27.3 1.935 -0.99939736
Porsche 914-2       26.0 2.140 -0.99630829
Lotus Europa        30.4 1.513 -0.99962223
Ford Pantera L      15.8 3.170 -0.93453339
Ferrari Dino        19.7 2.770 -0.96372018
Maserati Bora       15.0 3.570          NA
Volvo 142E          21.4 2.780          NA

Also see this SO post which is similar except in a data.table context: Rolling correlation with data.table 另请参见此SO帖子,除了在data.table上下文中类似: 与data.table滚动相关

It's not clear to me why you want to calculate what looks to me like a rolling correlation within a 3 row/observation window, but you could do something like this in base R: 对我来说尚不清楚, 为什么要在3行/观察窗口内计算看起来像滚动相关性的东西,但是您可以在基本R中执行以下操作:

x <- lapply(seq(1, nrow(mtcars) - 2), function(x) seq(x, x + 2))

Here x is a list containing as entries the rows/observations based on which we calculate the correlation. 这里的x是一个list其中包含行/观测值作为条目,我们根据该行/观测值计算相关性。

df <- do.call(rbind, lapply(x, function(x) cor(mtcars$mpg[x], mtcars$wt[x])))
df;
#        [,1]
#[1,] -0.88845855
#[2,] -0.82589964
#[3,] -0.87097656
#[4,] -0.99520846
#[5,] -0.99985063
#[6,] -0.99534538
#[7,] -0.97267882
#[8,] -0.90784130
#[9,] -0.96247218
#[10,] -0.86602540
#[11,] -0.99308187
#[12,] -0.05428913
#[13,] -0.96311366
#[14,] -0.99534934
#[15,]  0.05301502
#[16,] -0.98658763
#[17,] -0.96899291
#[18,]  0.44730718
#[19,] -0.86317499
#[20,] -0.94182141
#[21,] -0.99341821
#[22,] -0.94720046
#[23,]  0.21168794
#[24,] -0.90670560
#[25,] -0.99864434
#[26,] -0.99939736
#[27,] -0.99630829
#[28,] -0.99962223
#[29,] -0.93453339
#[30,] -0.96372018

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM