[英]How to add a variable in `df2` that specify the number of rows of a specific level of a variable from `df1` using `dplyr` or `data.table`
I have a dataframe df1
that summarizes detections of a fish species over time thanks to the use of acoustic transmitters (attached to the fish) and acoustic receivers (placed in the area).由于使用了声学发射器(连接到鱼)和声学接收器(放置在该区域),我有一个数据帧
df1
总结了随着时间的推移对鱼类的检测。 Those transmitters have two sensors, one for measuring activity and other for measuring the fish depth.这些发射器有两个传感器,一个用于测量活动,另一个用于测量鱼的深度。 The transmitters only can send one kind of data (either activity or depth) at a time, and they send the signal every several minutes as a minimum.
发射器一次只能发送一种数据(活动或深度),并且它们至少每隔几分钟发送一次信号。 In the end, what we get is a dataframe with the time for the detection of a fish (
DateTime
), the receiver that detected the individual ( Receiver
), the transmitter that was detected ( Transmitter
) and also the type of info that the transmitter sent ( Sensor
).最后,我们得到的是一个数据帧,其中包含检测鱼的时间(
DateTime
)、检测到个体的Receiver
( Receiver
)、检测到的Transmitter
( Transmitter
)以及Transmitter
的信息类型发送( Sensor
)。 Below I show a reproducible example:下面我展示了一个可重现的例子:
df1<-data.frame(DateTime=c("2016-08-01 12:04:07","2016-08-01 12:06:07","2016-08-01 13:12:12","2016-08-01 14:04:07","2016-08-01 15:01:45","2016-08-01 15:34:07","2016-08-01 16:25:16","2016-08-01 16:29:16","2016-08-01 16:33:16","2016-08-01 16:54:16","2016-08-01 16:58:16","2016-08-01 17:13:16","2016-08-01 17:21:16","2016-08-01 17:23:42","2016-08-01 17:27:16","2016-08-01 17:28:16","2016-08-01 17:29:28","2016-08-01 17:42:08"),
Receiver=c( "V6", "V7", "V6", "V6", "V7", "V7", "V6", "V6", "V6", "V7", "V7", "V7", "V6", "V6", "V6", "V9", "V7", "V4" ),
Transmitter=c(16 , 17, 16, 16, 17, 16, 17, 16, 16, 16, 17, 16, 16, 17, 17, 17, 16, 17),
Sensor=c("Activity","Depth","Activity","Activity","Depth","Activity","Activity","Depth","Activity","Activity","Activity","Depth","Activity","Activity","Depth","Activity","Activity","Activity"))
df1$DateTime<- as.POSIXct(df1$DateTime, format= "%Y-%m-%d %H:%M:%S", tz= "UTC")
df1
DateTime Receiver Transmitter Sensor
1 2016-08-01 12:04:07 V6 16 Activity
2 2016-08-01 12:06:07 V7 17 Depth
3 2016-08-01 13:12:12 V6 16 Activity
4 2016-08-01 14:04:07 V6 16 Activity
5 2016-08-01 15:01:45 V7 17 Depth
. . . . .
. . . . .
What I want is to create a dataframe df2
in which I have this information arranged in a different way.我想要的是创建一个数据帧
df2
在其中我以不同的方式排列这些信息。 I want to use hourly intervals in which each hour covers half an hour before and half an hour after ( RoundTime
).我想使用每小时的时间间隔,其中每小时包括前半小时和后半小时(
RoundTime
)。 For every RoundTime
I want for each transmitter ( Transmitter
) the number of times that was detected ( Num_det
), the number of different receivers that detected it ( Num_Rec
), the code of those receivers ( Which_Rec
), the number of detections with Activity
info ( n_Activity
) and the number of detections with Depth
info ( n_Depth
).对于每个
RoundTime
我想要每个发射器( Transmitter
)检测到的次数( Num_det
),检测到它的不同接收器的数量( Num_Rec
),这些接收器的代码( Which_Rec
),带有Activity
信息的检测次数( n_Activity
) 和具有Depth
信息 ( n_Depth
) 的检测n_Depth
。 I would expect this:我希望这样:
df2
RoundTime Transmitter Num_det n_Activity n_Depth Num_Rec Which_Rec
1 2016-08-01 12:00:00 16 1 1 0 1 V6
2 2016-08-01 12:00:00 17 1 0 1 1 V7
3 2016-08-01 13:00:00 16 1 1 0 1 V6
4 2016-08-01 13:00:00 17 0 0 0 NA <NA>
5 2016-08-01 14:00:00 16 1 1 0 1 V6
6 2016-08-01 14:00:00 17 0 0 0 NA <NA>
7 2016-08-01 15:00:00 16 0 0 0 NA <NA>
8 2016-08-01 15:00:00 17 1 0 1 1 V7
9 2016-08-01 16:00:00 16 2 1 1 2 V6 V7
10 2016-08-01 16:00:00 17 1 1 0 1 V6
11 2016-08-01 17:00:00 16 5 4 1 2 V6 V7
12 2016-08-01 17:00:00 17 4 3 1 3 V6 V7 V9
13 2016-08-01 18:00:00 16 0 0 0 NA <NA>
14 2016-08-01 18:00:00 17 1 1 0 1 V4
So far I got df2
with all the variables except n_Activity
and n_Depth
.到目前为止,除了
n_Activity
和n_Depth
之外,我得到了df2
的所有变量。 Here I show the code and the result:这里我展示了代码和结果:
library(lubridate)
library(tidyverse)
df2<-df1 %>%
# grouped by rounding the date by hour, Transmitter column
group_by(RoundTime = round_date(DateTime, "hour"), Transmitter) %>%
# get the Num_det as number of rows, add more groups
group_by(Num_det = n(),
which_Rec = toString(sort(unique(Receiver))), add = TRUE) %>%
# get the number of distinct elements of Receiver
summarise(Num_Rec = n_distinct(Receiver)) %>%
ungroup %>%
# expand the data to fill the missing combinations
complete(RoundTime, Transmitter, fill = list(Num_det = 0))%>%
select(RoundTime, Transmitter, Num_det, Num_Rec, which_Rec)
df2
# A tibble: 14 x 5
RoundTime Transmitter Num_det Num_Rec which_Rec
<dttm> <dbl> <dbl> <int> <chr>
1 2016-08-01 12:00:00.000 16 1 1 V6
2 2016-08-01 12:00:00.000 17 1 1 V7
3 2016-08-01 13:00:00.000 16 1 1 V6
4 2016-08-01 13:00:00.000 17 0 NA NA
5 2016-08-01 14:00:00.000 16 1 1 V6
6 2016-08-01 14:00:00.000 17 0 NA NA
7 2016-08-01 15:00:00.000 16 0 NA NA
8 2016-08-01 15:00:00.000 17 1 1 V7
9 2016-08-01 16:00:00.000 16 2 2 V6, V7
10 2016-08-01 16:00:00.000 17 1 1 V6
11 2016-08-01 17:00:00.000 16 5 2 V6, V7
12 2016-08-01 17:00:00.000 17 4 3 V6, V7, V9
13 2016-08-01 18:00:00.000 16 0 NA NA
14 2016-08-01 18:00:00.000 17 1 1 V4
Does anyone know which code I should add to the proposed before in order to create the variables n_Activity
and n_Depth
?有谁知道我应该在之前的提议中添加哪些代码以创建变量
n_Activity
和n_Depth
? If you know how to do it with the package data_table
is even better since my real dataframe has millions of rows and data.table
is more efficient.如果您知道如何使用包
data_table
会更好,因为我的真实数据帧有数百万行,而data.table
效率更高。
I guess all you need to do is count the number of "Activity" and "Depth" per group in your current code and I don't know why you have two group_by
there.我想您需要做的就是计算当前代码中每组“活动”和“深度”的数量,我不知道为什么您在那里有两个
group_by
。
library(dplyr)
library(lubridate)
df1 %>%
group_by(RoundTime = round_date(DateTime, "hour"), Transmitter) %>%
summarise(Num_det = n(),
which_Rec = toString(sort(unique(Receiver))),
Num_Rec = n_distinct(Receiver),
n_Activity = sum(Sensor == "Activity"),
n_Depth = sum(Sensor == "Depth")) %>%
ungroup %>%
tidyr::complete(RoundTime, Transmitter,
fill = list(Num_det = 0, n_Activity = 0, n_Depth = 0))
# A tibble: 14 x 7
# RoundTime Transmitter Num_det which_Rec Num_Rec n_Activity n_Depth
# <dttm> <dbl> <dbl> <chr> <int> <dbl> <dbl>
# 1 2016-08-01 12:00:00 16 1 V6 1 1 0
# 2 2016-08-01 12:00:00 17 1 V7 1 0 1
# 3 2016-08-01 13:00:00 16 1 V6 1 1 0
# 4 2016-08-01 13:00:00 17 0 NA NA 0 0
# 5 2016-08-01 14:00:00 16 1 V6 1 1 0
# 6 2016-08-01 14:00:00 17 0 NA NA 0 0
# 7 2016-08-01 15:00:00 16 0 NA NA 0 0
# 8 2016-08-01 15:00:00 17 1 V7 1 0 1
# 9 2016-08-01 16:00:00 16 2 V6, V7 2 1 1
#10 2016-08-01 16:00:00 17 1 V6 1 1 0
#11 2016-08-01 17:00:00 16 5 V6, V7 2 4 1
#12 2016-08-01 17:00:00 17 4 V6, V7, V9 3 3 1
#13 2016-08-01 18:00:00 16 0 NA NA 0 0
#14 2016-08-01 18:00:00 17 1 V4 1 1 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.