[英]How to use which.min(i) in a loop to return the minimum value per set in each variable?
[英]How to use min(i) in a loop to return the minimum value in group by column?
我对 R 编程很陌生,我试图在每个变量中按组获得最小值。 我有 300 多个变量,并试图在循环中运行 min() 和 group_by。
我想创建一个由每个组的最小值组成的列“MINVALUE”。
例如,
df$MINVALUE[1] == NA
df$MINVALUE[2] == min(df$SUM[1]:df$SUM[2], na.rm = T) -> desired value = 140
df$MINVALUE[3] == min(df$SUM[1]:df$SUM[3], na.rm = T) -> desired value = 120
df$MINVALUE[4] == min(df$SUM[1]:df$SUM[4], na.rm = T) -> desired value = 90
df$MINVALUE[5] == min(df$SUM[1]:df$SUM[5], na.rm = T) -> desired value = 90
df$MINVALUE[6] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP B
df$MINVALUE[7] == min(df$SUM[6]:df$SUM[7], na.rm = T) -> desired value = 40
df$MINVALUE[8] == min(df$SUM[6]:df$SUM[8], na.rm = T) -> desired value = 40
df$MINVALUE[9] == min(df$SUM[6]:df$SUM[9], na.rm = T) -> desired value = 40
df$MINVALUE[10] == min(df$SUM[6]:df$SUM[10], na.rm = T) -> desired value = 20
df$MINVALUE[11] == min(df$SUM[6]:df$SUM[11], na.rm = T) -> desired value = 19
df$MINVALUE[12] == min(df$SUM[6]:df$SUM[12], na.rm = T) -> desired ted value = 18
df$MINVALUE[13] == min(df$SUM[6]:df$SUM[13], na.rm = T) -> desired value = 10
df$MINVALUE[14] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP C
df$MINVALUE[15] == min(df$SUM[14]:df$SUM[15], na.rm = T) -> desired value = 100
.
.
这是一个虚拟数据集:
df <- data.frame("ID" = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C", "C", "C",
"D", "D",
"E",
"F", "F", "F", "F"),
"VISIT" = c("BL", "V1", "V2", "V3", "V4",
"BL", "V1", "V2", "V3", "V4", "V5", "V6", "V7",
"BL", "V1", "V2", "V3", "V4", "V5", "V6",
"BL", "V1",
"BL",
"BL", "V1", "V2", "V3"),
"SUM" = c(NA, 140, 120, 90, 100,
NA, 40, 70, 50, 20, 19, 18, 10,
NA, 100, 120, 130, 50, 20, 30,
NA, 50,
NA,
NA, 50, 20, 10))
以下是每组的预期值
这是我想要的 output:
df$MINVALUE <- c(NA, 140, 120, 90, 90,
NA, 40, 40, 40, 20, 19, 18, 10,
NA, 100, 100, 100, 50, 20, 20,
NA, 50,
NA,
NA, 50, 20, 10)
我不熟悉 R 和循环 function。 所以,我不能给你我尝试过的代码如果你有解决方案,我会很高兴学习。 谢谢!
我们可以使用ave()
按ID
和cummin()
) 进行分组。 由于 cummin cummin()
中没有na.rm=
参数,我们将NA's
替换为最小值的相反值,即Inf
。 在第二步中,在cummin()
的结果中,我们再次将Inf
replace
回NA
。
df <- transform(df, MINVALUE=ave(SUM, ID, FUN=\(x) {
x <- cummin(replace(x, is.na(x), Inf))
replace(x, is.infinite(x), NA)
}))
df
# ID VISIT SUM MINVALUE
# 1 A BL NA NA
# 2 A V1 140 140
# 3 A V2 120 120
# 4 A V3 90 90
# 5 A V4 100 90
# 6 B BL NA NA
# 7 B V1 40 40
# 8 B V2 70 40
# 9 B V3 50 40
# 10 B V4 20 20
# 11 B V5 19 19
# 12 B V6 18 18
# 13 B V7 10 10
# 14 C BL NA NA
# 15 C V1 100 100
# 16 C V2 120 100
# 17 C V3 130 100
# 18 C V4 50 50
# 19 C V5 20 20
# 20 C V6 30 20
# 21 D BL NA NA
# 22 D V1 50 50
# 23 E BL NA NA
# 24 F BL NA NA
# 25 F V1 50 50
# 26 F V2 20 20
# 27 F V3 10 10
结论:感谢cummin
,在 R 中,不需要循环来获得累积最小值。
注: R version 4.1.2 (2021-11-01)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.