简体   繁体   English

如何在循环中使用 which.min(i) 来返回每个变量中每组的最小值?

[英]How to use which.min(i) in a loop to return the minimum value per set in each variable?

I´m kind new to R programming and I´m trying to get the minimum value by group in each variable.我对 R 编程很陌生,我试图在每个变量中按组获得最小值。 I have more than 300 variables and am trying to run which.min() inside a loop.我有 300 多个变量,并试图在循环中运行 which.min() 。 Here´sa dummy dataset:这是一个虚拟数据集:

df <- data.frame("Set" = c(rep("A",3),rep("B",3),rep("C",3)),
                 "Pair" = rep(c("a","b","c"),3),
                 "y" = c(4,5,6,4,5,8,9,8,7), 
                 "x" = c(11,13,10,15,14,16,12,18,17),
                 "z" = c(19,20,21,22,23,24,25,26,27))

data:数据:

  Set Pair y  x  z
1   A    a 4 11 19
2   A    b 5 13 20
3   A    c 6 10 21
4   B    a 4 15 22
5   B    b 5 14 23
6   B    c 8 16 24
7   C    a 9 12 25
8   C    b 8 18 26
9   C    c 7 17 27

I´m trying:我正在努力:

library(data.table)
RES <- setDT(df[,c(1,2,3)])[ , .SD[which.min(y)], by = Set]
for (i in 2:3){
  df2 <- as.data.frame(df[,c(1,2,..i+2)])
  res2 <- setDT(df2)[ , .SD[which.min(i+2)], by = Set]
  RES <- cbind(RES,res2)
  rm(res2)
}

My desired output:我想要的 output:

  Set Pair y Set.1 Pair.1  x Set.2 Pair.2  z
1   A    a 4     A      c 10     A      a 19
2   B    a 4     B      b 14     B      a 22
3   C    c 7     C      a 12     C      a 25

The problem is within which.min() that does not accept i+2 or nor even i.问题在于 which.min() 不接受 i+2 甚至不接受 i。 How do I iterate through columns using which.min()?如何使用 which.min() 遍历列? I tried other functions too, but the closest I got was with this one.我也尝试了其他功能,但我得到的最接近的是这个。 I could do just res2 <- setDT(df)[, .SD[which.min(x)], by = Set] , but I have many many columns to go through.我可以做res2 <- setDT(df)[, .SD[which.min(x)], by = Set] ,但是我有很多很多列到 go 。 If you have another solution, I would be very happy to learn.如果您有其他解决方案,我将非常乐意学习。 Thank you!谢谢!

Here's an approach which uses lapply to loop through the columns of interest of y:z (eg, x, y, and z).这是一种使用lapply循环遍历y:z感兴趣的列(例如,x、y 和 z)的方法。 If we could simplify and only worry about the minimum, this is what it would look like:如果我们可以简化并且只关心最小值,它会是这样的:

library(data.table)
dt = as.data.table(df)
dt[, lapply(.SD, min), by = Set, .SDcols = y:z]

However, you are interested in both the matching Pair with the minimum value of the columns.但是,您对具有最小值的匹配Pair都感兴趣。 To do that, we use lapply to instead return the two values of interest.为此,我们使用lapply来返回两个感兴趣的值。 Then, to have the data combine correctly, we do.call('c', ...)然后,为了让数据正确组合,我们做do.call('c', ...)

library(data.table)
dt = as.data.table(df)
dt[, do.call('c', 
             lapply(.SD,
                    function(x) {
                      wm = which.min(x)
                      list(pair = Pair[wm],val =  x[wm])
                      })),
   by = Set,
   .SDcols = y:z]

##    Set y.pair y.val x.pair x.val z.pair z.val
## 1:   A      a     4      c    10      a    19
## 2:   B      a     4      b    14      a    22
## 3:   C      c     7      a    12      a    25

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM