简体   繁体   中英

Group by multiple variables and summarise dplyr

I'm trying to average CO2 concentration data every 30 seconds, for each of my sensors:

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
  <fctr> <date>     <time>  <dbl> <dttm>              <fctr> <chr>              
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

I use:

df %>%
  group_by(Sensor)%>%
  group_by(BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
  summarize(Concentration = mean(calCO2))

But it doesn't group by Sensor first, it ignores them and calculates the average over the BinnedTime instead. Any thoughts would be welcomed.

I've read about .dots=c("Sensor","BinnedTime") but this doesn't work.

Note, I haven't created dummy data so you can see exactly what mine looks like, as there seem to be some subtleties with time and date that I can't quite get my head around.

So to summarize the comments by @kath with some improvements to address your follow-on question:

df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        mutate(Concentration = mean(calCO2)) %>%
    ungroup()

The above will maintain all columns, but duplicate the Concentration calculation for each row of the df. An alternative that would allow you to both roll up and retain more columns of interest is to simply add them to the summarize operation, as illustrated below.

    df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        summarize(Concentration = mean(calCO2),
                   Date = min(Date),
                   Time = min(Time),
                   StartDeviceTime = min(DeviceTime),
                   EndDeviceTime = max(DeviceTime)) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM