如何按因子查找两列之间的相关系数？

Question

I have a large dataframe. 我有一个大的数据框。 I want to calculate the correlation coefficient between hot and index , by class 我想按类别计算hot和index之间的相关系数

ID    hot   index class
41400 10      2   a
41400 12      2   a
41400 75      4   a
41401 89      5   a 
41401 25      3   c
41401 100     6   c
20445 67      4   c
20445 89      6   c
20445 4       1   c
20443 67      5   d
20443 120.2   7   a
20443 140.5   8   d
20423 170.5   10   d
20423 78.1    5   c

Intended output 预期输出

a = 0.X (assumed numbers)
b = 0.Y
c = 0.Z

I know I can use the by command, but I am not able to. 我知道我可以使用by命令，但是我不能。

Code 码

cor_eqn = function(df){
  m = cor(hot ~ index, df);

}

by(df,df$class,cor_eqn,simplify = TRUE)

Answer 1

Another option is to use a data.table instead of a data.frame . 另一种选择是使用data.table而不是data.frame 。 You can just call setDT(df) on your existing data.frame (I created a data.table initially below): 您可以只在现有data.frame上调用setDT(df) （我最初在下面创建了一个data.table ）：

library(data.table)
##
set.seed(123)
DT <- data.table(
  ID=1:50000,
  class=rep(
    letters[1:4],
    each=12500),
  hot=rnorm(50000),
  index=rgamma(50000,shape=2))
## set key for better performance 
## with large data set
setkeyv(DT,class)
##
> DT[,list(Correlation=cor(hot,index)),by=class]
   class  Correlation
1:     a  0.005658200
2:     b  0.001651747
3:     c -0.002147164
4:     d -0.006248392

Answer 2

You can use dplyr for this: 您可以为此使用dplyr ：

library(dplyr)
gp = group_by(dataset, class)
correl = dplyr::summarise(gp, correl = cor(hot, index))
print(correl)

#  class   correl
#   a      0.9815492
#   c      0.9753372
#   d      0.9924337

Note that class and df are R functions, names like these can cause trouble. 注意class和df是R函数，类似这样的名称可能会引起麻烦。

如何按因子查找两列之间的相关系数？

问题描述

2 个解决方案

解决方案1
2 2014-09-12 21:00:13

解决方案2
0 已采纳 2014-09-12 20:56:17

如何按因子查找两列之间的相关系数？

问题描述

2 个解决方案

解决方案1 2 2014-09-12 21:00:13

解决方案2 0 已采纳 2014-09-12 20:56:17

解决方案1
2 2014-09-12 21:00:13

解决方案2
0 已采纳 2014-09-12 20:56:17