简体   繁体   中英

How to find the maximum value within each group and then recode all other values in the group as zero?

I have a data frame with the following simplified structure:

df <- data.frame(Id = c(1,1,1,2,2,2,3,3,3,4,4,4),
value = c(500,500,500,250,250,250,300,300,300,400,400,400))

and I am trying to get the following desired output:

df$maxByGroup <- c(500,0,0,250,0,0,300,0,0,400,0,0)

I have tried this:

df$Id <- as.factor(df$Id)

newDf <- df %>%
  group_by(Id) %>%
  summarise(maxByGroup = sum(max(value)))

and just get the maximum of 500 returned.

I have looked at other solutions that get the max value easily enough but I cannot find one that gives the max value and returns 0 for the other values within each group.

The most important aspect of the desired output is I want to maintain the data structure but have the first observation within each group to reflect the maximum and the rest to be recoded as zero. Any help that anyone could provide would be very much appreciated.

You can try this

df %>%
  group_by(Id) %>%
  mutate(maxByGroup = (which.max(value) == seq_along(value)) * value) %>%
  ungroup()

which gives

      Id value maxByGroup
   <dbl> <dbl>      <dbl>
 1     1   500        500
 2     1   500          0
 3     1   500          0
 4     2   250        250
 5     2   250          0
 6     2   250          0
 7     3   300        300
 8     3   300          0
 9     3   300          0
10     4   400        400
11     4   400          0
12     4   400          0

Could just arrange on value and set the first id to value and rest to 0:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
data.frame(Id = c(1,1,1,2,2,2,3,3,3,4,4,4),
           value = c(500,500,500,250,250,250,300,300,300,400,400,400)) %>% 
    group_by(Id) %>%
    arrange(desc(value)) %>%
    mutate(
        maxByGroup = if_else(row_number() == 1, value, 0)
    )
#> # A tibble: 12 x 3
#> # Groups:   Id [4]
#>       Id value maxByGroup
#>    <dbl> <dbl>      <dbl>
#>  1     1   500        500
#>  2     1   500          0
#>  3     1   500          0
#>  4     4   400        400
#>  5     4   400          0
#>  6     4   400          0
#>  7     3   300        300
#>  8     3   300          0
#>  9     3   300          0
#> 10     2   250        250
#> 11     2   250          0
#> 12     2   250          0

Created on 2022-01-31 by the reprex package (v2.0.0)

# calculate max and intra group row id

df[, `:=` (max_value = max(value)
           , dummy_row_id = 1:.N
           )
   , Id
   ]


# cast rows other than 1st intra group as 0

df[dummy_row_id > 1, max_value := 0][, dummy_row_id := NULL]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM