简体   繁体   English

如何按不符合条件的行名对数据帧进行子集化?

[英]How do I subset a data frame by row names that do not meet a condition?

I have a data frame that consists of names and dates. 我有一个包含名称和日期的数据框。 I wish to subset the data frame by names that do not appear in 3 consecutive years. 我希望通过连续三年不出现的名称来对数据框进行子集化。 Here is my data frame: 这是我的数据框:

data <- data.frame( Name = c("Dex","Dex","Rex","Rex","Rex","Lex","Lex", "Nex","Nex","Nex"), Year = c(2000, 2001, 2000, 2001, 2002, 2001, 2002, 2000, 2001, 2002 ))

Name Year Dex 2000 Dex 2001 Rex 2000 Rex 2001 Rex 2002 Lex 2001 Lex 2002 Nex 2000 Nex 2001 Nex 2002

This is the desired output: 这是所需的输出:

Name Year Dex 2000 Dex 2001 Lex 2001 Lex 2002

Is there a way to subset data according to conditions that are not met? 有没有一种方法可以根据不满足的条件对数据进行子集化?

In the example, the 'Year' for all unique 'Name' are consecutive. 在示例中,所有唯一“名称”的“年份”是连续的。 So, an easier option would be to group by 'Name' and filter if the number of distinct 'Year' is less than 3 or the number of rows ( n() ) is less than 3 因此,更简单的选择是按“名称”分组并filter如果不同的“年”数小于3或行数( n() )小于3

library(dplyr)
data %>%
   group_by(Name) %>% 
   filter(n_distinct(Year) < 3)
   #or the number of rows
   # filter(n() < 3)
# A tibble: 4 x 2
# Groups:   Name [2]
#  Name   Year
#  <fct> <dbl>
#1 Dex    2000
#2 Dex    2001
#3 Lex    2001
#4 Lex    2002

As a general case, after grouping by 'Name', we get the diff erence of adjacent 'Year', check if it is equal to 1 ie 1 year difference, use that in run-length-encoding ( rle ) to find the max imum length of sequence of consecutive 'year' is less than 3 to filter those 'Name' groups 一般情况下,按“名称”分组后,我们得到相邻“年”的diff ,检查是否等于1,即1年的差异,使用游程长度编码( rle )中的max连续“年”的序列的最大长度小于3以filter那些“名称”组

data %>%
   group_by(Name) %>% 
   filter(with(rle(c(TRUE, diff(Year)) == 1), max(lengths[values])) < 3)
# A tibble: 4 x 2
# Groups:   Name [2]
#  Name   Year
#  <fct> <dbl>
#1 Dex    2000
#2 Dex    2001
#3 Lex    2001
#4 Lex    2002

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM