简体   繁体   中英

How to perform operations across rows in dplyr

I am trying to figure out how to create a summary statistic that uses different rows' information in dplyr

Subject   BinLab      mean.RT 
s001      Deviant_RT  533.8115
s001      Standard_RT 508.2450
s002      Deviant_RT  465.5538
s002      Standard_RT 425.0351

Basically, I want to create a data frame that groups by subject and gives me the difference between the mean.RT for Deviant_RT and Standard_RT

This is what I have tried:

RTDataDifferenceWave <- RTData %>%
  group_by(Subject) %>%
  summarise(DiffRT = Deviant_RT-StandardRT)

I'm stuck on how to create this new dependent variable "DiffRT" which, again, is the difference between the Deviant_RT and Standard_RT. Would prefer an answer in dplyr but open to other solutions.

One way is to switch to a wide-data format:

RTDataDifferenceWave <- RTData %>% group_by(Subject) %>% 
  tidyr::spread(BinLab, mean.RT) %>% 
  mutate(DiffRT = Deviant_RT-Standard_RT)

Take into account that Deviant_RT and StandardRT are not columns, but instead are values of BinLab. In these case you can predefine the sign of mean.RT in each row using the value of BinLab, and then sum the values, like so:

RTDataDifferenceWave <- RTData %>%
  mutate(mean.RT_signed = mean.RT * ifelse(BinLab == 'Deviant_RT', 1, -1)) %>%
  group_by(Subject) %>%
  summarise(DiffRT = sum(mean.RT_signed))

Notice this assumes that BinLab can only be one of Deviant_RT or StandardRT. If it can assume other values, you could change the mutate to this:

  mutate(mean.RT_signed = mean.RT * ifelse(BinLab == 'Deviant_RT', 1, ifelse(BinLab == 'Standard_RT', -1, 0)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM