简体   繁体   中英

Using mutate in dplyr with conditions

Edit: Reverting this back to the original text, which the responses below are based around. Thank you all for your help and apologies for changing the question after everyone so graciously helped me.

I have a data frame which lists individuals, how many drinks they have had, what position they are in line, and whether they are eligible for a new drink.

dat <- data.frame(person = c("bill", "hank", "susy", "cliff", "betty"),
           total = c(3, 4, 5, 7, 8),
           position = c(1, 5, 3, 2, 4),
           eligible = c(0, 0, 1, 1, 1))

The goal is that for anybody that is eligible for a new drink, we must add to their total number of drinks, the total number of drinks of the person one-behind them in line (eg to person 4's total, we add the total number of drinks of person 5). For anyone not eligible for a new drink, we keep their old total. The desired output is as follows:

person   total   position   eligible   new_total
bill     3       1          0          3    
hank     4       5          0          4
susy     5       3          1          13   
cliff    7       2          1          12   
betty    8       4          1          12   

Does anyone know how I could do this using R and dplyr?

Thanks!

You can use mutate and ifelse. It helps to sort your list first.

dat <- dat %>%
      arrange(position) %>%
      mutate(new_total = ifelse(eligible, total+lead(total), total)) %>%
      arrange(total)

An option would be to create a sequence column with rn , arrange by 'position', then create the 'new_total' by adding the 'total' with lead of 'total' when the eligible is 1, and reorder based on the 'rn' column earlier created

library(dplyr)
dat %>% 
  mutate(rn = row_number())  %>%
  arrange(position) %>%  
  mutate(new_total = case_when(as.logical(eligible) ~
                  total + lead(total), TRUE ~ total)) %>% 
  arrange(rn) %>%
  select(-rn)
#   person total position eligible new_total
#1   bill     3        1        0         3
#2   hank     4        5        0         4
#3   susy     5        3        1        13
#4  cliff     7        2        1        12
#5  betty     8        4        1        12

Or using data.table

library(data.table)
setDT(dat)[order(position), new_total := total + shift(total, type = 'lead')
        ][eligible == 0, new_total := total][]
#   person total position eligible new_total
#1:   bill     3        1        0         3
#2:   hank     4        5        0         4
#3:   susy     5        3        1        13
#4:  cliff     7        2        1        12
#5:  betty     8        4        1        12

Eligible is already 0/1, so you can use that to your benefit by just multiplying the total for the next person by the eligibility (or, alternatively, setting any true/false condition there if it's not that simple):

dat %>% arrange(position) %>% 
 mutate(new_total=total+eligible*(lead(total,default=0)))
  person total position eligible new_total
1   bill     3        1        0         3
2  cliff     7        2        1        12
3   susy     5        3        1        13
4  betty     8        4        1        12
5   hank     4        5        0         4

Just for fun, I compared the three solutions (although, with such a small dataset, this comparison may be inaccurate):

Unit: milliseconds
  expr      min       lq      mean   median        uq      max neval
   iod 2.485992 2.694608  3.535079 2.921297  3.347454 28.47935   100
 brian 3.700652 4.037115  4.759614 4.268713  4.973099 16.12168   100
 arkun(dplyr) 8.173740 9.117087 10.194020 9.715270 10.730906 17.32028   100

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM