简体   繁体   English

是否可以通过R中的多个组进行汇总?

[英]Is there a way to aggregate by multiple groups in R?

I have a camera trap dataset with Filenames, SiteID, Species, Count, Date, Time, etc. What I am trying to do is create a record table where I have the MAXIMUM number of independent detections, for each species detected, for each camera station (ie if the independent interval was set at 30 mins, and there was a detection of 2 deer and a detection of 13 deer within the same 30 minute interval, I want the 13 to be used instead of the 2). 我有一个包含文件名,SiteID,种类,计数,日期,时间等的相机陷阱数据集。我想要做的是创建一个记录表,其中对于每个检测到的物种,每个相机,我具有最大独立检测次数站(即,如果将独立间隔设置为30分钟,并且在同一30分钟间隔内检测到2头鹿,检测到13头鹿,我希望使用13头而不是2头)。

Original data: 原始数据:

File     SiteID     Date            Time       Species     Count
Can_001  YVR01      03-May-2018     21:34:25   Squirrel    3
Can_001  YVR01      03-May-2018     21:34:58   Squirrel    3
Can_001  YVR01      03-May-2018     21:36:25   Squirrel    1

What I have done so far is to try and first group by siteID, then by Species, then by Date, and then to create a column where there are 30 minute time intervals, from which I need to then figure out how to get the maximum 'Count' value within the time interval - these will be the detections I am using. 到目前为止,我要做的是先按siteID,按Species,按Date分组,然后创建一个每隔30分钟的时间间隔的列,然后我需要从中找出如何获得最大的时间间隔。在时间间隔内的“计数”值-这些将是我正在使用的检测值。

species_group <- group_by(y4, SiteID) %>% group_by(Species) %>% group_by(Date) %>% group_by(Interval_Time=floor_date(DateTimeOriginalp, "30 minutes"))

I was able to get to the stage where the 30 minute interval period was created and column was created, but after this point no summarise(), aggregate(), tapply() etc function seems to work, as it won't allow me to pull up the "Interval_Time" column created. 我能够进入创建30分钟间隔周期并创建列的阶段,但是在此之后,summarise(),aggregate(),tapply()等函数似乎都无法正常工作,因为它不允许我拉出创建的“ Interval_Time”列。 The new Interval_Time column is in dttm format, and shows up when I view and call the species_group dataframe. 新的Interval_Time列为dttm格式,并在我查看和调用species_group数据框时显示。 What I need to do now is get the MAX count of each species within these intervals. 我现在需要做的是获取这些间隔内每个物种的最大数量。 This is what I tried (ie outside of the pipe): 这是我尝试过的(即在管道外部):

speciesgroup3 <- aggregate(species_group$Count, by=list(species_group$Interval_Time), max)

Which returned a dataframe of just two columns, the maximum count and the Interval_Time...which isn't useful as I need this data separated first by site and then by species. 它返回的数据帧只有两列,即最大计数和Interval_Time ...这没什么用,因为我需要首先按地点然后按物种分开的数据。

For the life of me I can't figure out why I can't call Interval_Time as a column within the pipe above. 对于我的一生,我无法弄清楚为什么不能将Interval_Time称为上方管道中的一列。 Any help would be greatly appreciated! 任何帮助将不胜感激!

Maybe you could use: 也许您可以使用:

aggregate(. ~Interval_Time+SiteID, data=species_group, max, na.rm=TRUE)

Look for instance here for a similar problem https://stats.stackexchange.com/questions/169056/aggregate-all-data-by-date-and-id 在此处查找类似问题的实例https://stats.stackexchange.com/questions/169056/aggregate-all-data-by-date-and-id

This should be close to what you are looking for, using the dplyr functions included in library tidyverse 使用tidyverse库中包含的dplyr函数,这应该与您要查找的内容接近

library(tidyverse)
library(lubridate)
df = read.table(text="
File     SiteID     Date            Time       Species     Count
Can_001  YVR01      03-May-2018     21:34:25   Squirrel    3
Can_001  YVR01      03-May-2018     21:34:58   Squirrel    3
Can_001  YVR01      03-May-2018     22:01:25   Squirrel    1
Can_001  YVR01      03-May-2018     21:34:58   Deer        5
Can_001  YVR01      03-May-2018     21:36:25   Deer        7
", header=T)

# Use mutate and the lubridate::mdy_hms to derive a proper date
# column from the text date time 
df2 <- df %>%
  mutate(DateTime = mdy_hms(paste(Date, Time)),
         period = floor_date(DateTime, "30 mins")) %>%
         select(-Date, -Time)

# File SiteID  Species Count            DateTime              period
# 1 Can_001  YVR01 Squirrel     3 2018-03-20 21:34:25 2018-03-20 21:30:00
# 2 Can_001  YVR01 Squirrel     3 2018-03-20 21:34:58 2018-03-20 21:30:00
# 3 Can_001  YVR01 Squirrel     1 2018-03-20 22:01:25 2018-03-20 22:00:00
# 4 Can_001  YVR01     Deer     5 2018-03-20 21:34:58 2018-03-20 21:30:00
# 5 Can_001  YVR01     Deer     7 2018-03-20 21:36:25 2018-03-20 21:30:00

# Summarize dow to the period level, applying the max function within the group by
df2 %>% 
  group_by(SiteID, Species, period) %>%
  summarize(n = max(Count))

# Groups:   SiteID, Species [?]
# SiteID Species  period                  n
# <fct>  <fct>    <dttm>              <dbl>
# 1 YVR01  Deer     2018-03-20 21:30:00     7
# 2 YVR01  Squirrel 2018-03-20 21:30:00     3
# 3 YVR01  Squirrel 2018-03-20 22:00:00     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM