简体   繁体   English

计算由第三个变量分组的两个变量之间的相关系数

[英]Calculate Correlation Coefficient Between 2 Variables Grouped by a 3rd Variable

I have an excel spreadsheet with 3 columns. 我有3列的Excel电子表格。 The first column is the id of a picture which groups the data together, and the 2nd and 3rd columns are the values I am trying to find a correlation coefficient for. 第一列是将数据分组在一起的图片的ID,第二列和第三列是我试图为其找到相关系数的值。

For example: 例如:

ID  Dat1 Dat2
130 4   4.3
130 7.5 5
130 6.6 6
180 5.6 
180 5   8.7
180 7.1 5

In that example, the data is grouped by the values in the 1st column and then they have separate data in the 2nd and 3rd columns. 在该示例中,数据按第一列中的值分组,然后它们在第二列和第三列中具有单独的数据。 I'm not sure whether it would be easier to find the correlation coefficients for each grouping using excel or R. 我不确定使用excel或R查找每个分组的相关系数是否会更容易。

I have tried the Data Analysis add-in in Excel but it won't work for 3 columns. 我已经尝试在Excel中使用Data Analysis加载项,但不适用于3列。

Thanks in advance! 提前致谢!

The real data has hundreds of thousands of lines of data. 实际数据具有数十万行数据。 This is just an example. 这只是一个例子。

Solution using data.table 使用data.table解决方案

# install.packages("data.table")
library(data.table)
df <- data.table(df)
df[,cor(Dat1,Dat2),by="ID"]

You could try 你可以试试

library(dplyr)
df1 %>% 
   group_by(ID) %>% 
   summarise(Cor= cor(Dat1, Dat2, use= "na.or.complete"))
#   ID        Cor
#1 130  0.6407453
#2 180 -1.0000000

data 数据

df1 <- structure(list(ID = c(130L, 130L, 130L, 180L, 180L, 180L),
Dat1 = c(4, 
7.5, 6.6, 5.6, 5, 7.1), Dat2 = c(4.3, 5, 6, NA, 8.7, 5)), .Names = c("ID", 
"Dat1", "Dat2"), class = "data.frame", row.names = c(NA, -6L))

Two base R solutions, using @akrun's data: 使用@akrun的数据的两个基本R解决方案:

with(df1, by(cbind(Dat1, Dat2), ID, cor, use = "complete"))
# INDICES: 130
#           Dat1      Dat2
# Dat1 1.0000000 0.6407453
# Dat2 0.6407453 1.0000000
# ----------------------------------------------------------------------------------------------------------------------- 
# INDICES: 180
#      Dat1 Dat2
# Dat1    1   -1
# Dat2   -1    1

lapply(split(df1, df1$ID), function(x) cor(x$Dat1, x$Dat2, use = "complete"))
# $`130`
# [1] 0.6407453
# 
# $`180`
# [1] -1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM