如何將 function 應用於數據框中的所有行組合？

Question

我無法解決以下關於（通過限制列數簡化）數據框“注釋”的問題。

require(irr)
# data
annotations <- read.table(text = "Obj1    Obj2    Obj3
Rater1     a       b       c
Rater2     a       b       b
Rater3     a       b       c", header = TRUE, stringsAsFactors = FALSE)

我想將 irr package 中的 function 同意應用於行的所有組合（不是排列），結果如下。

Agreement rater 1-2: 67%
Agreement rater 1-3: 100%
Agreement rater 2-3: 67%

我需要在所有行組合上運行 function ，並且 function 需要訪問多個/所有列。

我已經解決了部分問題的答案； 我已經生成了一個運行combn(rownames(annotations), 2)的組合列表，但是我不知道如何使用這個列表而不編寫低效的 for 循環。

我已經嘗試過應用，如apply(annotations, 1, agree) ，但我只能讓它在一行上工作，而不是前面提到的組合。

有誰知道如何進行？

更新：根據您的建議，以下解決方案有效。 （我使用了 irr kappa2中的 kappa2 而不是agree ，但主要問題的解決方案保持不變。）

require(irr) #require the irr library for agreement calculations
annotations <- read.table(text = "Obj1    Obj2    Obj3
Rater1     a       b       c
Rater2     a       b       b
Rater3     a       b       c
Rater4     c       a       a", header = TRUE, stringsAsFactors = FALSE)

annotations <- t(annotations) #transpose annotations (rows become columns and vice versa)
kappa_list <- combn(colnames(annotations), 2, FUN=function(x) kappa_list[[length(kappa_list)+1]] = kappa2(matrix(c(annotations[,x[1]], annotations[,x[2]]), ncol=2))$value) #fill kappa_list with all pairs of columns (combinations of 2 raters) in annotations and, per combination, add a value to kappa_list that consists of the value of kappa2 applied to the current combination of raters
kappa_list # display the list of values

Answer 1

你接近，你只需要apply的結果combn代替。 我不知道你指的是什么功能，但如果插入你的功能，這應該是一樣的。

首先，將結果保存為列表，因為更容易添加名稱（我將兩個條目組合在一起）：

toCheck <- combn(rownames(annotations), 2, simplify = FALSE)

names(toCheck) <-
  sapply(toCheck, paste, collapse = " - ")

然后，使用sapply來完成組合。 在這里，我使用mean來進行比較，但是在這里使用你需要的東西。 如果要返回的值不止一個，請使用lapply然后根據需要使用結果進行打印

sapply(toCheck, function(x){
  mean(annotations[x[1], ] == annotations[x[2], ])
})

哪個回報：

Rater 1 - Rater 2 Rater 1 - Rater 3 Rater 2 - Rater 3 
        0.6666667         1.0000000         0.6666667

Answer 2

將函數f(x):= 2x+5應用於與組合對應的列的所有條目。 而不是f(x):= 2x+5 ，可以編寫他/她自己的函數：

第1步：設計特定的組合數據幀。 （以下是針對我自己的情況）

causalitycombinations <- function (nvars, ncausers, ndependents)
{
    independents <- combn(nvars, ncausers)
    swingnumber <- dim(combn(nvars - ncausers, ndependents))[[2]]
    numberofallcombinations <- dim(combn(nvars, ncausers))[[2]] * swingnumber
    dependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ndependents)
    for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
        dependents[(swingnumber * (i - 1) + 1):(swingnumber * i), ] <- t(combn(setdiff(seq(1:nvars), independents[, i]), ndependents))
    }
    swingedindependents <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = ncausers)
    for (i in as.integer(1:dim(combn(nvars, ncausers))[[2]])) {
        for (j in as.integer(1:swingnumber)) {
            swingedindependents[(i - 1) * swingnumber + j, ] <- independents[, i]
        }
    }
    independentsdependents <- cbind(swingedindependents, dependents)
    others <- matrix(, nrow = dim(combn(nvars, ncausers))[[2]] * swingnumber, ncol = nvars - ncausers - ndependents)
    for (i in as.integer(1:((dim(combn(nvars, ncausers))[[2]]) * swingnumber))) {
        others[i, ] <- setdiff(seq(1:nvars), independentsdependents[i, ])
    }
    causalitiestemplate <- cbind(independentsdependents, others)
    causalitiestemplate
}

    causalitycombinations(3,1,1)
#     [,1] [,2] [,3]
#[1,]    1    2    3
#[2,]    1    3    2
#[3,]    2    1    3
#[4,]    2    3    1
#[5,]    3    1    2
#[6,]    3    2    1

第2步：將數據附加到組合
（一個可以附加多個列，為簡單起見我只添加了1個）

set.seed(1)
mydataframer <- cbind(causalitycombinations(3,1,1), rnorm(6))
mydataframer
 #     [,1] [,2] [,3]       [,4]
 #[1,]    1    2    3 -0.6264538
 #[2,]    1    3    2  0.1836433
 #[3,]    2    1    3 -0.8356286
 #[4,]    2    3    1  1.5952808
 #[5,]    3    1    2  0.3295078
 #[6,]    3    2    1 -0.8204684

步驟3：通過lapply應用函數，同時考慮復合數據幀的行數

lapply(1: dim(mydataframer)[[1]], function(x) {2*mydataframer[x,4] + 5})

# 3.747092
# 5.367287
# 3.328743
# 8.190562
# 5.659016
# 3.359063

這就對了。

順便說一句， ?irr::agree幫助文件指出nxm評級矩陣/數據幀是“n個主題，m raters”。 因此，提問者可以通過以下方式更好地設計：

annotations <- read.table(text = "Rater1    Rater2    Rater3
Subject1     a       b       c
Subject2     a       b       b
Subject3     a       b       c", header = TRUE, stringsAsFactors = FALSE)

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      b      c
# Subject2      a      b      b
# Subject3      a      b      c

此外，還有一件事需要澄清，提問者是否想要循環所有這樣的注釋組合。 如果是這樣的話，即

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      a
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      b
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      c
# Subject2      a      a      a
# Subject3      a      a      a

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      b      a
# Subject2      a      a      a
# Subject3      a      a      a

# .... after consuming all Subject1 possibilities, this time consuming Subject2 possibilities,

annotations
#         Rater1 Rater2 Rater3
# Subject1      a      a      a
# Subject2      a      a      b
# Subject3      a      a      a

然后主題3的可能性，從而收集協議的所有可能性，然后問題完全改變。

irr::agree功能設計用於多行。 從其幫助文件中觀察：

data(video)
video
#   rater1 rater2 rater3 rater4
# 1       4      4      3      4
# 2       4      4      4      5
# ..............................
# 20      4      5      5      4

agree(video)     # Simple percentage agreement
# Percentage agreement (Tolerance=0)
# Subjects = 20; Raters = 4; %-agree = 35 

agree(video, 1)  # Extended percentage agreement
# Percentage agreement (Tolerance=1)
# Subjects = 20; Raters = 4; %-agree = 90

而在提問者想要逐行同意的情況下 （ 只有1個主題！ ）， ％-agree總是為0 ：

agree(video[1,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0

...

agree(video[20,])
# Percentage agreement (Tolerance=0)
# Subjects = 1; Raters = 4; %-agree = 0

Answer 3

一種快速的方法是為每個行組合的行號序列制作兩個向量，為一側的行制作一個矩陣，為另一側的行制作另一個矩陣，然后將向量化的 function 應用於矩陣：

es=1:3
r=sapply(es,function(e){
  nrow=10^e
  ncol=8
  m=matrix(rnorm(ncol*nrow),nrow)

  b=microbenchmark(times=10,
    row_numbers_from_vectors={
      z=seq(nrow)
      i1=rep(z[-length(z)],times=rev(tail(z,-1))-1)
      i2=unlist(lapply(2:nrow,function(x)x:nrow),use.names=F)
      o=m[i1,]+m[i2,]
    },
    for_loops={
      o=matrix(nrow=nrow*(nrow-1)/2,ncol=ncol)
      n=1;for(i in 1:(nrow-1))for(j in(i+1):nrow){o[n,]=m[i,]+m[j,];n=n+1}
    },
    combn_direct={o=t(combn(nrow,2,function(x)m[x[1],]+m[x[2],]))},
    combn_apply={o=t(apply(combn(nrow,2),2,function(x)m[x[1],]+m[x[2],]))}
  )
  a=aggregate(b$time,list(b$expr),median)
  setNames(a[,2],gsub(" ","",a[,1]))/1e6
})

r2=apply(r,2,function(x)formatC(x,max(0,2-ceiling(log10(min(x,na.rm=T)))),format="f"))
r3=apply(rbind(paste0("1e",es),r2),2,function(x)formatC(x,max(nchar(x)),format="s"))
writeLines(apply(cbind(r3,c("",rownames(r))),1,paste,collapse=" "))

中位時間（毫秒）：

  1e1   1e2  1e3 
0.037  0.60   43 row_numbers_from_vectors
0.094 10.26 1019 for_loops
0.124 12.09 1224 combn_direct
0.180 14.84 1580 combn_apply

這是c=combn(nrow,2);i1=c[1,];i2=c[2,]的更快替代方案（請參閱如何有效地生成對稱矩陣的下三角形索引）：

z=seq(nrow)
i1=rep(z[-length(z)],times=rev(tail(z,-1))-1)
i2=unlist(lapply(2:nrow,function(x)x:nrow),use.names=F)

如何將 function 應用於數據框中的所有行組合？

問題描述

3 個解決方案

解決方案1
3 已采納 2016-11-09 21:17:40

解決方案2
0 2016-11-09 21:55:45

解決方案3
0 2022-07-31 08:55:19

如何將 function 應用於數據框中的所有行組合？

問題描述

3 個解決方案

解決方案1 3 已采納 2016-11-09 21:17:40

解決方案2 0 2016-11-09 21:55:45

解決方案3 0 2022-07-31 08:55:19

解決方案1
3 已采納 2016-11-09 21:17:40

解決方案2
0 2016-11-09 21:55:45

解決方案3
0 2022-07-31 08:55:19