I have a data frame with the following simplified structure:
df <- data.frame(Id = c(1,1,1,2,2,2,3,3,3,4,4,4),
value = c(500,500,500,250,250,250,300,300,300,400,400,400))
and I am trying to get the following desired output:
df$maxByGroup <- c(500,0,0,250,0,0,300,0,0,400,0,0)
I have tried this:
df$Id <- as.factor(df$Id)
newDf <- df %>%
group_by(Id) %>%
summarise(maxByGroup = sum(max(value)))
and just get the maximum of 500 returned.
I have looked at other solutions that get the max value easily enough but I cannot find one that gives the max value and returns 0 for the other values within each group.
The most important aspect of the desired output is I want to maintain the data structure but have the first observation within each group to reflect the maximum and the rest to be recoded as zero. Any help that anyone could provide would be very much appreciated.
You can try this
df %>%
group_by(Id) %>%
mutate(maxByGroup = (which.max(value) == seq_along(value)) * value) %>%
ungroup()
which gives
Id value maxByGroup
<dbl> <dbl> <dbl>
1 1 500 500
2 1 500 0
3 1 500 0
4 2 250 250
5 2 250 0
6 2 250 0
7 3 300 300
8 3 300 0
9 3 300 0
10 4 400 400
11 4 400 0
12 4 400 0
Could just arrange on value and set the first id to value and rest to 0:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
data.frame(Id = c(1,1,1,2,2,2,3,3,3,4,4,4),
value = c(500,500,500,250,250,250,300,300,300,400,400,400)) %>%
group_by(Id) %>%
arrange(desc(value)) %>%
mutate(
maxByGroup = if_else(row_number() == 1, value, 0)
)
#> # A tibble: 12 x 3
#> # Groups: Id [4]
#> Id value maxByGroup
#> <dbl> <dbl> <dbl>
#> 1 1 500 500
#> 2 1 500 0
#> 3 1 500 0
#> 4 4 400 400
#> 5 4 400 0
#> 6 4 400 0
#> 7 3 300 300
#> 8 3 300 0
#> 9 3 300 0
#> 10 2 250 250
#> 11 2 250 0
#> 12 2 250 0
Created on 2022-01-31 by the reprex package (v2.0.0)
# calculate max and intra group row id
df[, `:=` (max_value = max(value)
, dummy_row_id = 1:.N
)
, Id
]
# cast rows other than 1st intra group as 0
df[dummy_row_id > 1, max_value := 0][, dummy_row_id := NULL]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.