简体   繁体   English

如何在循环中使用 min(i) 来逐列返回分组中的最小值?

[英]How to use min(i) in a loop to return the minimum value in group by column?

I´m kind new to R programming and I´m trying to get the minimum value by group in each variable.我对 R 编程很陌生,我试图在每个变量中按组获得最小值。 I have more than 300 variables and am trying to run min() and group_by inside a loop.我有 300 多个变量,并试图在循环中运行 min() 和 group_by。

I want make a column "MINVALUE" which composed of minimum value by each group.我想创建一个由每个组的最小值组成的列“MINVALUE”。

for example,例如,

df$MINVALUE[1] == NA
df$MINVALUE[2] == min(df$SUM[1]:df$SUM[2], na.rm = T) -> desired value = 140
df$MINVALUE[3] == min(df$SUM[1]:df$SUM[3], na.rm = T) -> desired value = 120
df$MINVALUE[4] == min(df$SUM[1]:df$SUM[4], na.rm = T) -> desired value = 90
df$MINVALUE[5] == min(df$SUM[1]:df$SUM[5], na.rm = T) -> desired value = 90
df$MINVALUE[6] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP B
df$MINVALUE[7] == min(df$SUM[6]:df$SUM[7], na.rm = T) -> desired value = 40
df$MINVALUE[8] == min(df$SUM[6]:df$SUM[8], na.rm = T) -> desired value = 40
df$MINVALUE[9] == min(df$SUM[6]:df$SUM[9], na.rm = T) -> desired value = 40
df$MINVALUE[10] == min(df$SUM[6]:df$SUM[10], na.rm = T) -> desired value = 20
df$MINVALUE[11] == min(df$SUM[6]:df$SUM[11], na.rm = T) -> desired value = 19
df$MINVALUE[12] == min(df$SUM[6]:df$SUM[12], na.rm = T) -> desired ted value = 18
df$MINVALUE[13] == min(df$SUM[6]:df$SUM[13], na.rm = T) -> desired value = 10
df$MINVALUE[14] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP C
df$MINVALUE[15] == min(df$SUM[14]:df$SUM[15], na.rm = T) -> desired value = 100
.
.

Here´sa dummy dataset:这是一个虚拟数据集:

df <- data.frame("ID" = c("A", "A", "A", "A", "A", 
                 "B", "B", "B", "B", "B", "B", "B", "B",
                 "C", "C", "C", "C", "C", "C", "C",
                 "D", "D", 
                 "E",
                 "F", "F", "F", "F"),
"VISIT" = c("BL", "V1", "V2", "V3", "V4", 
            "BL", "V1", "V2", "V3", "V4", "V5", "V6", "V7",
            "BL", "V1", "V2", "V3", "V4", "V5", "V6",
            "BL", "V1",
            "BL", 
            "BL", "V1", "V2", "V3"),
"SUM" = c(NA, 140, 120, 90, 100,
          NA, 40, 70, 50, 20, 19, 18, 10,
          NA, 100, 120, 130, 50, 20, 30,
          NA, 50, 
          NA, 
          NA, 50, 20, 10))

below is expected value by each group以下是每组的预期值

df expected value df 期望值

here's my desired output:这是我想要的 output:

df$MINVALUE <- c(NA, 140, 120, 90, 90,
                 NA, 40, 40, 40, 20, 19, 18, 10,
                 NA, 100, 100, 100, 50, 20, 20,
                 NA, 50,
                 NA, 
                 NA, 50, 20, 10)

I'm not familiar with R and loop function.我不熟悉 R 和循环 function。 So, I couldn't give you a code i had tried to do If you have a solution, I would be very happy to learn.所以,我不能给你我尝试过的代码如果你有解决方案,我会很高兴学习。 Thank you!谢谢!

We may use ave() to group by ID , and cummin() .我们可以使用ave()IDcummin() ) 进行分组。 Since there is no na.rm= argument in cummin() , we replace NA's with the opposite of a minimum, ie Inf .由于 cummin cummin()中没有na.rm=参数,我们将NA's替换为最小值的相反值,即Inf In a second step, in the result of cummin() we replace Inf back again to NA .在第二步中,在cummin()的结果中,我们再次将Inf replaceNA

df <- transform(df, MINVALUE=ave(SUM, ID, FUN=\(x) {
  x <- cummin(replace(x, is.na(x), Inf))
  replace(x, is.infinite(x), NA)
  }))
df
#    ID VISIT SUM MINVALUE
# 1   A    BL  NA       NA
# 2   A    V1 140      140
# 3   A    V2 120      120
# 4   A    V3  90       90
# 5   A    V4 100       90
# 6   B    BL  NA       NA
# 7   B    V1  40       40
# 8   B    V2  70       40
# 9   B    V3  50       40
# 10  B    V4  20       20
# 11  B    V5  19       19
# 12  B    V6  18       18
# 13  B    V7  10       10
# 14  C    BL  NA       NA
# 15  C    V1 100      100
# 16  C    V2 120      100
# 17  C    V3 130      100
# 18  C    V4  50       50
# 19  C    V5  20       20
# 20  C    V6  30       20
# 21  D    BL  NA       NA
# 22  D    V1  50       50
# 23  E    BL  NA       NA
# 24  F    BL  NA       NA
# 25  F    V1  50       50
# 26  F    V2  20       20
# 27  F    V3  10       10

Conclusion: Thanks to cummin , in R there's no loop needed to get cumulative minimums.结论:感谢cummin ,在 R 中,不需要循环来获得累积最小值。

Note: R version 4.1.2 (2021-11-01)注: R version 4.1.2 (2021-11-01)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在循环中使用 which.min(i) 来返回每个变量中每组的最小值? - How to use which.min(i) in a loop to return the minimum value per set in each variable? 使用plyr :: ddply按组返回列的最大值/最小值的行 - return rows with max/min value of column, by group, using plyr::ddply min()为什么不返回实际最小值? - Why does min() not return the actual minimum value? 如何根据特定组中的最小值返回多个变量的值 - How to return the vaules of several variables based on the minimum value in a specific group 在 function 中使用 dplyr slice_min 来获取与列最小值关联的行 - Use of dplyr slice_min within a function to obtain row associated with a column minimum value 如何返回R中数据框中包含最小值的列的索引 - How to return the index of a column containing minimum value in a dataframe in R 如何循环获取最小 RMSE 值并在每列中使用“应用”进行预测 - How to loop for minimum RMSE values and use “apply” for forecasting in each column 如何在 reactable 中使用聚合 function 来显示与另一列的最小值或最大值相关的值? - How can I use the aggregate function in reactable to display a value associated with the min or max of another column? 如何在 r 中返回 dataframe 中的最小值列表? - How can i return list of min value in dataframe in r? 如何用列中最小值的一半替换零? - How can I replace zeros with half the minimum value within a column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM