简体   繁体   中英

R — Identify values within group_by and mutate

The attempted problem is to group by column A and create a flag for every row in the group if any of its values in column B are over 100. The example table looks like this-

Column A Column B
NYC 95
NYC 98
BOS 88
BOS 101
BOS 67
MIA 90

And for the resulting df to look like this-

Column A Column B fg
NYC 95
NYC 98
BOS 88 1
BOS 101 1
BOS 67 1
MIA 90

Even if one of the values is over 100, I would like the fg column to return 1 for all the rows.

The attempted code was as follows-

df %>% group_by(Column A) %>% mutate(fg = ifelse(Column B >= 100,1, ''))

but this seems to error out. What other method could work?

You can try -

library(dplyr)

df %>% 
  group_by(ColumnA) %>% 
  mutate(fg = as.integer(any(ColumnB >= 100))) %>%
  ungroup

#  ColumnA ColumnB    fg
#  <chr>     <int> <int>
#1 NYC          95     0
#2 NYC          98     0
#3 BOS          88     1
#4 BOS         101     1
#5 BOS          67     1
#6 MIA          90     0

The same can be written in base R and data.table as well -

#Data.table
library(data.table)
setDT(df)[, fg := as.integer(any(ColumnB >= 100)), ColumnA]

#Base R
df <- transform(df, fg = as.integer(ave(ColumnB >= 100, ColumnA, FUN = any)))

You logic for the syntax is completely fine. However, what cause the trouble is that the structure of dataframe should not has string with space in between as variable eg "Column B", which should be rename as "Column.B" or "Column_B"

This work perfectly fine after fixing the column name

library(dplyr)
df %>% group_by(Column.A) %>% mutate(fg = ifelse(Column.B >= 100 ,1, 0)) 


  Column.A Column.B    fg
  <chr>       <dbl> <dbl>
1 NYC            95     0
2 NYC            98     0
3 BOS            88     0
4 BOS           101     1
5 BOS            67     0
6 MIA            90     0

We could use case_when with any

library(dplyr)
df %>% 
  group_by(Column.A) %>% 
  dplyr::mutate(flag_over100 = case_when(any(Column.B > 100, na.rm = TRUE) ~ 1,
                                  TRUE ~ 0))

Output:

  Column.A Column.B flag_over100
  <chr>       <int>        <dbl>
1 NYC            95            0
2 NYC            98            0
3 BOS            88            1
4 BOS           101            1
5 BOS            67            1
6 MIA            90            0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM