I´m kind new to R programming and I´m trying to get the minimum value by group in each variable. I have more than 300 variables and am trying to run min() and group_by inside a loop.
I want make a column "MINVALUE" which composed of minimum value by each group.
for example,
df$MINVALUE[1] == NA
df$MINVALUE[2] == min(df$SUM[1]:df$SUM[2], na.rm = T) -> desired value = 140
df$MINVALUE[3] == min(df$SUM[1]:df$SUM[3], na.rm = T) -> desired value = 120
df$MINVALUE[4] == min(df$SUM[1]:df$SUM[4], na.rm = T) -> desired value = 90
df$MINVALUE[5] == min(df$SUM[1]:df$SUM[5], na.rm = T) -> desired value = 90
df$MINVALUE[6] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP B
df$MINVALUE[7] == min(df$SUM[6]:df$SUM[7], na.rm = T) -> desired value = 40
df$MINVALUE[8] == min(df$SUM[6]:df$SUM[8], na.rm = T) -> desired value = 40
df$MINVALUE[9] == min(df$SUM[6]:df$SUM[9], na.rm = T) -> desired value = 40
df$MINVALUE[10] == min(df$SUM[6]:df$SUM[10], na.rm = T) -> desired value = 20
df$MINVALUE[11] == min(df$SUM[6]:df$SUM[11], na.rm = T) -> desired value = 19
df$MINVALUE[12] == min(df$SUM[6]:df$SUM[12], na.rm = T) -> desired ted value = 18
df$MINVALUE[13] == min(df$SUM[6]:df$SUM[13], na.rm = T) -> desired value = 10
df$MINVALUE[14] == NA -> If group changed, reset a loop from df$VISIT == "BL" in GROUP C
df$MINVALUE[15] == min(df$SUM[14]:df$SUM[15], na.rm = T) -> desired value = 100
.
.
Here´sa dummy dataset:
df <- data.frame("ID" = c("A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C", "C", "C",
"D", "D",
"E",
"F", "F", "F", "F"),
"VISIT" = c("BL", "V1", "V2", "V3", "V4",
"BL", "V1", "V2", "V3", "V4", "V5", "V6", "V7",
"BL", "V1", "V2", "V3", "V4", "V5", "V6",
"BL", "V1",
"BL",
"BL", "V1", "V2", "V3"),
"SUM" = c(NA, 140, 120, 90, 100,
NA, 40, 70, 50, 20, 19, 18, 10,
NA, 100, 120, 130, 50, 20, 30,
NA, 50,
NA,
NA, 50, 20, 10))
below is expected value by each group
here's my desired output:
df$MINVALUE <- c(NA, 140, 120, 90, 90,
NA, 40, 40, 40, 20, 19, 18, 10,
NA, 100, 100, 100, 50, 20, 20,
NA, 50,
NA,
NA, 50, 20, 10)
I'm not familiar with R and loop function. So, I couldn't give you a code i had tried to do If you have a solution, I would be very happy to learn. Thank you!
We may use ave()
to group by ID
, and cummin()
. Since there is no na.rm=
argument in cummin()
, we replace NA's
with the opposite of a minimum, ie Inf
. In a second step, in the result of cummin()
we replace
Inf
back again to NA
.
df <- transform(df, MINVALUE=ave(SUM, ID, FUN=\(x) {
x <- cummin(replace(x, is.na(x), Inf))
replace(x, is.infinite(x), NA)
}))
df
# ID VISIT SUM MINVALUE
# 1 A BL NA NA
# 2 A V1 140 140
# 3 A V2 120 120
# 4 A V3 90 90
# 5 A V4 100 90
# 6 B BL NA NA
# 7 B V1 40 40
# 8 B V2 70 40
# 9 B V3 50 40
# 10 B V4 20 20
# 11 B V5 19 19
# 12 B V6 18 18
# 13 B V7 10 10
# 14 C BL NA NA
# 15 C V1 100 100
# 16 C V2 120 100
# 17 C V3 130 100
# 18 C V4 50 50
# 19 C V5 20 20
# 20 C V6 30 20
# 21 D BL NA NA
# 22 D V1 50 50
# 23 E BL NA NA
# 24 F BL NA NA
# 25 F V1 50 50
# 26 F V2 20 20
# 27 F V3 10 10
Conclusion: Thanks to cummin
, in R there's no loop needed to get cumulative minimums.
Note: R version 4.1.2 (2021-11-01)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.