[英]keeping only non-breaking groups
I have a kind of time series which is sorted increasingly.我有一种越来越多地排序的时间序列。 Some observations which have no data at first year but have records regularly after no data season.第一年没有数据但在无数据季后定期有记录的一些观测值。 But some of the groups which started to have data, breaks recording data again.但一些开始有数据的团体,再次打破记录数据。
To be understood, I made up a dummy data frame which represents this situation;要理解,我制作了一个代表这种情况的虚拟数据框;
set.seed(1453)
data.frame(id = rep(10:15,4)) %>%
group_by(id) %>%
mutate(year=2012:2015) %>%
arrange(year,.by_group = T) %>%
mutate(some_column = sample(c(NA,1),size = 4,replace = T))
the data looks like;数据看起来像;
id year some_column
10 2012 1
10 2013 NA
10 2014 1
10 2015 NA
11 2012 NA
11 2013 1
11 2014 1
11 2015 NA
12 2012 NA
12 2013 1
12 2014 NA
12 2015 1
13 2012 1
13 2013 NA
13 2014 1
13 2015 1
14 2012 NA
14 2013 NA
14 2014 NA
14 2015 1
15 2012 NA
15 2013 1
15 2014 1
15 2015 1
I want only these observations;我只想要这些观察;
id year some_column
14 2012 NA
14 2013 NA
14 2014 NA
14 2015 1
15 2012 NA
15 2013 1
15 2014 1
15 2015 1
Perhaps this works也许这行得通
library(dplyr)
library(data.table)
df1 %>%
group_by(id) %>%
filter(n_distinct(rleid(some_column)) <=2) %>%
ungroup
-output -输出
# A tibble: 8 x 3
id year some_column
<int> <int> <int>
1 14 2012 NA
2 14 2013 NA
3 14 2014 NA
4 14 2015 1
5 15 2012 NA
6 15 2013 1
7 15 2014 1
8 15 2015 1
df1 <- structure(list(id = c(10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 14L, 14L, 14L, 14L, 15L,
15L, 15L, 15L), year = c(2012L, 2013L, 2014L, 2015L, 2012L, 2013L,
2014L, 2015L, 2012L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L,
2015L, 2012L, 2013L, 2014L, 2015L, 2012L, 2013L, 2014L, 2015L
), some_column = c(1L, NA, 1L, NA, NA, 1L, 1L, NA, NA, 1L, NA,
1L, 1L, NA, 1L, 1L, NA, NA, NA, 1L, NA, 1L, 1L, 1L)),
class = "data.frame", row.names = c(NA,
-24L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.