简体   繁体   English

使用 R 中其他列的频率和出现时间选择列的重复项

[英]Selecting duplicates of a column using the frequency and occurring time of other columns in R

I am using R and I have a data set with ID , boarding_number , and time .我正在使用 R 并且我有一个包含IDboarding_numbertime的数据集。 There are duplicates in ID and baridng_numner comlumns and for each row there is time . IDbaridng_numner comlumns 中有重复项,并且每一行都有time Some of the boarding_nubmer values are linked to different IDs .一些boarding_nubmer值链接到不同的IDs this is a snapshot of the data:这是数据的快照:

ID  boarding_number  time              
 1     1234         2020-05-05 12:28:36
 2     7891         2020-05-12 11:21:36
 3     5432         2020-04-17 10:22:26
 4     1234         2020-06-11 10:11:36
 1     1234         2020-05-18 09:28:36
 5     7744         2020-08-10 09:11:11
 2     7891         2020-05-29 18:21:41
 6     7891         2020-06-01 11:21:36
 7     1234         2020-06-12 10:111:46

So, I would like to select those bording_numbers that have been linked to more than one ( >=2 ) unique ID .所以,我想bording_numbers那些已经链接到多个( >=2 )唯一ID的 bording_numbers 。 And, I want to select the ID and time_second_occurance corresponding right to the second occurrence - the first time a boarding_number is linked to a different ID than the first ID it was linked to.而且,我想 select IDtime_second_occurance对应于第二次出现的权利 - 第一次将 board_number 链接到与其链接到的第一个ID不同的ID I would also like to report the number_occurances of the boarding_number.我还想报告 board_number 的number_occurances

An outcome as follows would be my goal:如下结果将是我的目标:

 ID   boarding_number  time_second_occurance      number_occurances 
 4     1234            2020-06-11 10:11:36                3
 6     7891            2020-06-01 11:21:36                2

Using dplyr使用dplyr

library(dplyr)

df %>% 
  group_by(boarding_number) %>% 
  mutate(number_occuarances = n_distinct(ID)) %>% 
  filter(n_distinct(ID) >= 2) %>% 
  filter(!duplicated(ID)) %>% 
  slice(2)

# A tibble: 2 x 5
# Groups:   boarding_number [2]
     ID boarding_number time       hour     number_occuarances
  <int>           <int> <chr>      <chr>                 <int>
1     4            1234 2020-06-11 10:11:36                  3
2     6            7891 2020-06-01 11:21:36                  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他数据框中列的值选择 R 数据框中的列 - Selecting columns in R dataframe based on values of column in other dataframe 考虑对列进行分组并根据R中的其他列选择行 - Consider grouping for a column and selecting rows based on other columns in R 如何在 R 中将 10 列收集到一个列中,将其他 10 列收集到另一个列中,计数和频率仅使用 tidyverse - How to gather 10 columns into a column and other 10 columns into another colum, with counts and frequency with tidyverse only, in R 将列条目的频率汇总到R中的单独列中 - Aggregating frequency of column entries into separate columns in R 删除基于多列的重复项,但选择保留最后一个,使用 R - Deleting duplicates based on multiple columns but selecting which last one to keep, using R 识别重复项(两列),根据另一列求和,并将其他变量保留在 R - Identify duplicates (two columns), sum it based on another column, and keep other variables in R 使用 r 有条件地处理与另一列的重复项 - Deal with duplicates conditionally to an other column with r 使用r跨多个列的频率计数 - frequency count across multiple columns using r hist 列依赖于其他列的相对频率 (R) - hist column in dependency of the relative frequency of other column (R) 使用 R 根据其他列的值选择列 - Select Columns Based on Other Column's Value Using R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM