如何在`df2`中添加一个变量，使用`dplyr`或`data.table`从`df1`指定变量的特定级别的行数

Question

I have a dataframe df1 that summarizes detections of a fish species over time thanks to the use of acoustic transmitters (attached to the fish) and acoustic receivers (placed in the area).由于使用了声学发射器（连接到鱼）和声学接收器（放置在该区域），我有一个数据帧df1总结了随着时间的推移对鱼类的检测。 Those transmitters have two sensors, one for measuring activity and other for measuring the fish depth.这些发射器有两个传感器，一个用于测量活动，另一个用于测量鱼的深度。 The transmitters only can send one kind of data (either activity or depth) at a time, and they send the signal every several minutes as a minimum.发射器一次只能发送一种数据（活动或深度），并且它们至少每隔几分钟发送一次信号。 In the end, what we get is a dataframe with the time for the detection of a fish ( DateTime ), the receiver that detected the individual ( Receiver ), the transmitter that was detected ( Transmitter ) and also the type of info that the transmitter sent ( Sensor ).最后，我们得到的是一个数据帧，其中包含检测鱼的时间（ DateTime ）、检测到个体的Receiver （ Receiver ）、检测到的Transmitter （ Transmitter ）以及Transmitter的信息类型发送（ Sensor ）。 Below I show a reproducible example:下面我展示了一个可重现的例子：

df1<-data.frame(DateTime=c("2016-08-01 12:04:07","2016-08-01 12:06:07","2016-08-01 13:12:12","2016-08-01 14:04:07","2016-08-01 15:01:45","2016-08-01 15:34:07","2016-08-01 16:25:16","2016-08-01 16:29:16","2016-08-01 16:33:16","2016-08-01 16:54:16","2016-08-01 16:58:16","2016-08-01 17:13:16","2016-08-01 17:21:16","2016-08-01 17:23:42","2016-08-01 17:27:16","2016-08-01 17:28:16","2016-08-01 17:29:28","2016-08-01 17:42:08"),
                Receiver=c( "V6", "V7", "V6", "V6", "V7", "V7", "V6", "V6", "V6", "V7", "V7", "V7", "V6", "V6", "V6", "V9", "V7", "V4" ),
                Transmitter=c(16 , 17, 16, 16, 17, 16, 17, 16, 16, 16, 17, 16, 16, 17, 17, 17, 16, 17),
                Sensor=c("Activity","Depth","Activity","Activity","Depth","Activity","Activity","Depth","Activity","Activity","Activity","Depth","Activity","Activity","Depth","Activity","Activity","Activity"))
df1$DateTime<- as.POSIXct(df1$DateTime, format= "%Y-%m-%d %H:%M:%S", tz= "UTC")

df1

              DateTime Receiver Transmitter   Sensor
1  2016-08-01 12:04:07       V6          16 Activity
2  2016-08-01 12:06:07       V7          17    Depth
3  2016-08-01 13:12:12       V6          16 Activity
4  2016-08-01 14:04:07       V6          16 Activity
5  2016-08-01 15:01:45       V7          17    Depth
.            .                .           .       .
.            .                .           .       .

What I want is to create a dataframe df2 in which I have this information arranged in a different way.我想要的是创建一个数据帧df2在其中我以不同的方式排列这些信息。 I want to use hourly intervals in which each hour covers half an hour before and half an hour after ( RoundTime ).我想使用每小时的时间间隔，其中每小时包括前半小时和后半小时（ RoundTime ）。 For every RoundTime I want for each transmitter ( Transmitter ) the number of times that was detected ( Num_det ), the number of different receivers that detected it ( Num_Rec ), the code of those receivers ( Which_Rec ), the number of detections with Activity info ( n_Activity ) and the number of detections with Depth info ( n_Depth ).对于每个RoundTime我想要每个发射器（ Transmitter ）检测到的次数（ Num_det ），检测到它的不同接收器的数量（ Num_Rec ），这些接收器的代码（ Which_Rec ），带有Activity信息的检测次数( n_Activity ) 和具有Depth信息 ( n_Depth ) 的检测n_Depth 。 I would expect this:我希望这样：

df2
             RoundTime Transmitter Num_det n_Activity n_Depth Num_Rec Which_Rec
1  2016-08-01 12:00:00          16       1          1       0       1        V6
2  2016-08-01 12:00:00          17       1          0       1       1        V7
3  2016-08-01 13:00:00          16       1          1       0       1        V6
4  2016-08-01 13:00:00          17       0          0       0      NA      <NA>
5  2016-08-01 14:00:00          16       1          1       0       1        V6
6  2016-08-01 14:00:00          17       0          0       0      NA      <NA>
7  2016-08-01 15:00:00          16       0          0       0      NA      <NA>
8  2016-08-01 15:00:00          17       1          0       1       1        V7
9  2016-08-01 16:00:00          16       2          1       1       2     V6 V7
10 2016-08-01 16:00:00          17       1          1       0       1        V6
11 2016-08-01 17:00:00          16       5          4       1       2     V6 V7
12 2016-08-01 17:00:00          17       4          3       1       3  V6 V7 V9
13 2016-08-01 18:00:00          16       0          0       0      NA      <NA>
14 2016-08-01 18:00:00          17       1          1       0       1        V4

So far I got df2 with all the variables except n_Activity and n_Depth .到目前为止，除了n_Activity和n_Depth之外，我得到了df2的所有变量。 Here I show the code and the result:这里我展示了代码和结果：

library(lubridate)
library(tidyverse)
df2<-df1 %>% 
   # grouped by rounding the date by hour, Transmitter column
   group_by(RoundTime = round_date(DateTime, "hour"), Transmitter) %>% 
   # get the Num_det as number of rows, add more groups
   group_by(Num_det = n(), 
           which_Rec = toString(sort(unique(Receiver))), add = TRUE) %>%        
   # get the number of distinct elements of Receiver
   summarise(Num_Rec = n_distinct(Receiver)) %>% 
   ungroup %>% 
   # expand the data to fill the missing combinations 
   complete(RoundTime, Transmitter, fill = list(Num_det = 0))%>% 
   select(RoundTime, Transmitter, Num_det, Num_Rec, which_Rec)

df2
# A tibble: 14 x 5
   RoundTime               Transmitter Num_det Num_Rec which_Rec 
   <dttm>                        <dbl>   <dbl>   <int> <chr>     
 1 2016-08-01 12:00:00.000          16       1       1 V6        
 2 2016-08-01 12:00:00.000          17       1       1 V7        
 3 2016-08-01 13:00:00.000          16       1       1 V6        
 4 2016-08-01 13:00:00.000          17       0      NA NA        
 5 2016-08-01 14:00:00.000          16       1       1 V6        
 6 2016-08-01 14:00:00.000          17       0      NA NA        
 7 2016-08-01 15:00:00.000          16       0      NA NA        
 8 2016-08-01 15:00:00.000          17       1       1 V7        
 9 2016-08-01 16:00:00.000          16       2       2 V6, V7    
10 2016-08-01 16:00:00.000          17       1       1 V6        
11 2016-08-01 17:00:00.000          16       5       2 V6, V7    
12 2016-08-01 17:00:00.000          17       4       3 V6, V7, V9
13 2016-08-01 18:00:00.000          16       0      NA NA        
14 2016-08-01 18:00:00.000          17       1       1 V4

Does anyone know which code I should add to the proposed before in order to create the variables n_Activity and n_Depth ?有谁知道我应该在之前的提议中添加哪些代码以创建变量n_Activity和n_Depth ？ If you know how to do it with the package data_table is even better since my real dataframe has millions of rows and data.table is more efficient.如果您知道如何使用包data_table会更好，因为我的真实数据帧有数百万行，而data.table效率更高。

Answer 1

I guess all you need to do is count the number of "Activity" and "Depth" per group in your current code and I don't know why you have two group_by there.我想您需要做的就是计算当前代码中每组“活动”和“深度”的数量，我不知道为什么您在那里有两个group_by 。

library(dplyr)
library(lubridate)

df1 %>% 
  group_by(RoundTime = round_date(DateTime, "hour"), Transmitter) %>% 
  summarise(Num_det = n(), 
            which_Rec = toString(sort(unique(Receiver))),
            Num_Rec = n_distinct(Receiver), 
            n_Activity = sum(Sensor == "Activity"), 
            n_Depth = sum(Sensor == "Depth")) %>%
   ungroup %>% 
   tidyr::complete(RoundTime, Transmitter, 
           fill = list(Num_det = 0, n_Activity = 0, n_Depth = 0))


# A tibble: 14 x 7
#   RoundTime           Transmitter Num_det which_Rec  Num_Rec n_Activity n_Depth
#   <dttm>                    <dbl>   <dbl> <chr>        <int>      <dbl>   <dbl>
# 1 2016-08-01 12:00:00          16       1 V6               1          1       0
# 2 2016-08-01 12:00:00          17       1 V7               1          0       1
# 3 2016-08-01 13:00:00          16       1 V6               1          1       0
# 4 2016-08-01 13:00:00          17       0 NA              NA          0       0
# 5 2016-08-01 14:00:00          16       1 V6               1          1       0
# 6 2016-08-01 14:00:00          17       0 NA              NA          0       0
# 7 2016-08-01 15:00:00          16       0 NA              NA          0       0
# 8 2016-08-01 15:00:00          17       1 V7               1          0       1
# 9 2016-08-01 16:00:00          16       2 V6, V7           2          1       1
#10 2016-08-01 16:00:00          17       1 V6               1          1       0
#11 2016-08-01 17:00:00          16       5 V6, V7           2          4       1
#12 2016-08-01 17:00:00          17       4 V6, V7, V9       3          3       1
#13 2016-08-01 18:00:00          16       0 NA              NA          0       0
#14 2016-08-01 18:00:00          17       1 V4               1          1       0

如何在`df2`中添加一个变量，使用`dplyr`或`data.table`从`df1`指定变量的特定级别的行数

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-01-08 09:46:33

如何在`df2`中添加一个变量，使用`dplyr`或`data.table`从`df1`指定变量的特定级别的行数

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-01-08 09:46:33

解决方案1
1 已采纳 2020-01-08 09:46:33