[英]Selecting duplicates of a column using the frequency and occurring time of other columns in R
I am using R and I have a data set with ID
, boarding_number
, and time
.我正在使用 R 并且我有一个包含
ID
、 boarding_number
和time
的数据集。 There are duplicates in ID
and baridng_numner
comlumns and for each row there is time
. ID
和baridng_numner
comlumns 中有重复项,并且每一行都有time
。 Some of the boarding_nubmer
values are linked to different IDs
.一些
boarding_nubmer
值链接到不同的IDs
。 this is a snapshot of the data:这是数据的快照:
ID boarding_number time
1 1234 2020-05-05 12:28:36
2 7891 2020-05-12 11:21:36
3 5432 2020-04-17 10:22:26
4 1234 2020-06-11 10:11:36
1 1234 2020-05-18 09:28:36
5 7744 2020-08-10 09:11:11
2 7891 2020-05-29 18:21:41
6 7891 2020-06-01 11:21:36
7 1234 2020-06-12 10:111:46
So, I would like to select those bording_numbers
that have been linked to more than one ( >=2
) unique ID
.所以,我想
bording_numbers
那些已经链接到多个( >=2
)唯一ID
的 bording_numbers 。 And, I want to select the ID
and time_second_occurance
corresponding right to the second occurrence - the first time a boarding_number is linked to a different ID
than the first ID
it was linked to.而且,我想 select
ID
和time_second_occurance
对应于第二次出现的权利 - 第一次将 board_number 链接到与其链接到的第一个ID
不同的ID
。 I would also like to report the number_occurances
of the boarding_number.我还想报告 board_number 的
number_occurances
。
An outcome as follows would be my goal:如下结果将是我的目标:
ID boarding_number time_second_occurance number_occurances
4 1234 2020-06-11 10:11:36 3
6 7891 2020-06-01 11:21:36 2
Using dplyr
使用
dplyr
library(dplyr)
df %>%
group_by(boarding_number) %>%
mutate(number_occuarances = n_distinct(ID)) %>%
filter(n_distinct(ID) >= 2) %>%
filter(!duplicated(ID)) %>%
slice(2)
# A tibble: 2 x 5
# Groups: boarding_number [2]
ID boarding_number time hour number_occuarances
<int> <int> <chr> <chr> <int>
1 4 1234 2020-06-11 10:11:36 3
2 6 7891 2020-06-01 11:21:36 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.