使用 R 中其他列的频率和出现时间选择列的重复项

Question

I am using R and I have a data set with ID , boarding_number , and time .我正在使用 R 并且我有一个包含ID 、 boarding_number和time的数据集。 There are duplicates in ID and baridng_numner comlumns and for each row there is time . ID和baridng_numner comlumns 中有重复项，并且每一行都有time 。 Some of the boarding_nubmer values are linked to different IDs .一些boarding_nubmer值链接到不同的IDs 。 this is a snapshot of the data:这是数据的快照：

ID  boarding_number  time              
 1     1234         2020-05-05 12:28:36
 2     7891         2020-05-12 11:21:36
 3     5432         2020-04-17 10:22:26
 4     1234         2020-06-11 10:11:36
 1     1234         2020-05-18 09:28:36
 5     7744         2020-08-10 09:11:11
 2     7891         2020-05-29 18:21:41
 6     7891         2020-06-01 11:21:36
 7     1234         2020-06-12 10:111:46

So, I would like to select those bording_numbers that have been linked to more than one ( >=2 ) unique ID .所以，我想bording_numbers那些已经链接到多个（ >=2 ）唯一ID的 bording_numbers 。 And, I want to select the ID and time_second_occurance corresponding right to the second occurrence - the first time a boarding_number is linked to a different ID than the first ID it was linked to.而且，我想 select ID和time_second_occurance对应于第二次出现的权利 - 第一次将 board_number 链接到与其链接到的第一个ID不同的ID 。 I would also like to report the number_occurances of the boarding_number.我还想报告 board_number 的number_occurances 。

An outcome as follows would be my goal:如下结果将是我的目标：

 ID   boarding_number  time_second_occurance      number_occurances 
 4     1234            2020-06-11 10:11:36                3
 6     7891            2020-06-01 11:21:36                2

Answer 1

Using dplyr使用dplyr

library(dplyr)

df %>% 
  group_by(boarding_number) %>% 
  mutate(number_occuarances = n_distinct(ID)) %>% 
  filter(n_distinct(ID) >= 2) %>% 
  filter(!duplicated(ID)) %>% 
  slice(2)

# A tibble: 2 x 5
# Groups:   boarding_number [2]
     ID boarding_number time       hour     number_occuarances
  <int>           <int> <chr>      <chr>                 <int>
1     4            1234 2020-06-11 10:11:36                  3
2     6            7891 2020-06-01 11:21:36                  2

使用 R 中其他列的频率和出现时间选择列的重复项

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-07-14 23:10:07

使用 R 中其他列的频率和出现时间选择列的重复项

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-07-14 23:10:07

解决方案1
0 已采纳 2020-07-14 23:10:07