简体   繁体   English

只保留不间断的组

[英]keeping only non-breaking groups

I have a kind of time series which is sorted increasingly.我有一种越来越多地排序的时间序列。 Some observations which have no data at first year but have records regularly after no data season.第一年没有数据但在无数据季后定期有记录的一些观测值。 But some of the groups which started to have data, breaks recording data again.但一些开始有数据的团体,再次打破记录数据。

To be understood, I made up a dummy data frame which represents this situation;要理解,我制作了一个代表这种情况的虚拟数据框;

set.seed(1453)

data.frame(id = rep(10:15,4)) %>%
group_by(id) %>%
mutate(year=2012:2015) %>%
arrange(year,.by_group = T) %>%
mutate(some_column = sample(c(NA,1),size = 4,replace = T))

the data looks like;数据看起来像;

id  year    some_column

10  2012    1
10  2013    NA
10  2014    1
10  2015    NA
11  2012    NA
11  2013    1
11  2014    1
11  2015    NA
12  2012    NA
12  2013    1
12  2014    NA
12  2015    1
13  2012    1
13  2013    NA
13  2014    1
13  2015    1
14  2012    NA
14  2013    NA
14  2014    NA
14  2015    1
15  2012    NA
15  2013    1
15  2014    1
15  2015    1

I want only these observations;我只想要这些观察;

id  year    some_column

14  2012    NA
14  2013    NA
14  2014    NA
14  2015    1
15  2012    NA
15  2013    1
15  2014    1
15  2015    1

Perhaps this works也许这行得通

library(dplyr)
library(data.table)
df1 %>% 
    group_by(id) %>% 
     filter(n_distinct(rleid(some_column)) <=2)  %>%
     ungroup

-output -输出

# A tibble: 8 x 3
     id  year some_column
  <int> <int>       <int>
1    14  2012          NA
2    14  2013          NA
3    14  2014          NA
4    14  2015           1
5    15  2012          NA
6    15  2013           1
7    15  2014           1
8    15  2015           1

data数据

df1 <- structure(list(id = c(10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 15L, 
15L, 15L, 15L), year = c(2012L, 2013L, 2014L, 2015L, 2012L, 2013L, 
2014L, 2015L, 2012L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 
2015L, 2012L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 2015L
), some_column = c(1L, NA, 1L, NA, NA, 1L, 1L, NA, NA, 1L, NA, 
1L, 1L, NA, 1L, 1L, NA, NA, NA, 1L, NA, 1L, 1L, 1L)),
 class = "data.frame", row.names = c(NA, 
-24L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM