繁体   English   中英

如何在循环中使用 min(i) 来逐列返回分组中的最小值?

[英]How to use min(i) in a loop to return the minimum value in group by column?

我对 R 编程很陌生,我试图在每个变量中按组获得最小值。 我有 300 多个变量,并试图在循环中运行 min() 和 group_by。

我想创建一个由每个组的最小值组成的列“MINVALUE”。

例如,

df$MINVALUE[1] == NA
df$MINVALUE[2] == min(df$SUM[1]:df$SUM[2], na.rm = T) -> desired value = 140
df$MINVALUE[3] == min(df$SUM[1]:df$SUM[3], na.rm = T) -> desired value = 120
df$MINVALUE[4] == min(df$SUM[1]:df$SUM[4], na.rm = T) -> desired value = 90
df$MINVALUE[5] == min(df$SUM[1]:df$SUM[5], na.rm = T) -> desired value = 90
df$MINVALUE[6] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP B
df$MINVALUE[7] == min(df$SUM[6]:df$SUM[7], na.rm = T) -> desired value = 40
df$MINVALUE[8] == min(df$SUM[6]:df$SUM[8], na.rm = T) -> desired value = 40
df$MINVALUE[9] == min(df$SUM[6]:df$SUM[9], na.rm = T) -> desired value = 40
df$MINVALUE[10] == min(df$SUM[6]:df$SUM[10], na.rm = T) -> desired value = 20
df$MINVALUE[11] == min(df$SUM[6]:df$SUM[11], na.rm = T) -> desired value = 19
df$MINVALUE[12] == min(df$SUM[6]:df$SUM[12], na.rm = T) -> desired ted value = 18
df$MINVALUE[13] == min(df$SUM[6]:df$SUM[13], na.rm = T) -> desired value = 10
df$MINVALUE[14] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP C
df$MINVALUE[15] == min(df$SUM[14]:df$SUM[15], na.rm = T) -> desired value = 100
.
.

这是一个虚拟数据集:

df <- data.frame("ID" = c("A", "A", "A", "A", "A", 
                 "B", "B", "B", "B", "B", "B", "B", "B",
                 "C", "C", "C", "C", "C", "C", "C",
                 "D", "D", 
                 "E",
                 "F", "F", "F", "F"),
"VISIT" = c("BL", "V1", "V2", "V3", "V4", 
            "BL", "V1", "V2", "V3", "V4", "V5", "V6", "V7",
            "BL", "V1", "V2", "V3", "V4", "V5", "V6",
            "BL", "V1",
            "BL", 
            "BL", "V1", "V2", "V3"),
"SUM" = c(NA, 140, 120, 90, 100,
          NA, 40, 70, 50, 20, 19, 18, 10,
          NA, 100, 120, 130, 50, 20, 30,
          NA, 50, 
          NA, 
          NA, 50, 20, 10))

以下是每组的预期值

df 期望值

这是我想要的 output:

df$MINVALUE <- c(NA, 140, 120, 90, 90,
                 NA, 40, 40, 40, 20, 19, 18, 10,
                 NA, 100, 100, 100, 50, 20, 20,
                 NA, 50,
                 NA, 
                 NA, 50, 20, 10)

我不熟悉 R 和循环 function。 所以,我不能给你我尝试过的代码如果你有解决方案,我会很高兴学习。 谢谢!

我们可以使用ave()IDcummin() ) 进行分组。 由于 cummin cummin()中没有na.rm=参数,我们将NA's替换为最小值的相反值,即Inf 在第二步中,在cummin()的结果中,我们再次将Inf replaceNA

df <- transform(df, MINVALUE=ave(SUM, ID, FUN=\(x) {
  x <- cummin(replace(x, is.na(x), Inf))
  replace(x, is.infinite(x), NA)
  }))
df
#    ID VISIT SUM MINVALUE
# 1   A    BL  NA       NA
# 2   A    V1 140      140
# 3   A    V2 120      120
# 4   A    V3  90       90
# 5   A    V4 100       90
# 6   B    BL  NA       NA
# 7   B    V1  40       40
# 8   B    V2  70       40
# 9   B    V3  50       40
# 10  B    V4  20       20
# 11  B    V5  19       19
# 12  B    V6  18       18
# 13  B    V7  10       10
# 14  C    BL  NA       NA
# 15  C    V1 100      100
# 16  C    V2 120      100
# 17  C    V3 130      100
# 18  C    V4  50       50
# 19  C    V5  20       20
# 20  C    V6  30       20
# 21  D    BL  NA       NA
# 22  D    V1  50       50
# 23  E    BL  NA       NA
# 24  F    BL  NA       NA
# 25  F    V1  50       50
# 26  F    V2  20       20
# 27  F    V3  10       10

结论:感谢cummin ,在 R 中,不需要循环来获得累积最小值。

注: R version 4.1.2 (2021-11-01)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM