[英]Select those groups which are available for certain years
I have a data.table
as following:-我有一个
data.table
如下:-
datazzz <- data.table(group = c(rep("a", times = 3),
rep("b", times = 4),
rep("c", times = 4),
rep("k", times = 2),
rep("f", times = 4)),
year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
2017:2020),
values = runif(17))
datazzz
group year values
1: a 2017 0.14710475
2: a 2018 0.23493958
3: a 2019 0.97570157
4: b 2016 0.82078366
5: b 2017 0.92685531
6: b 2018 0.64406726
7: b 2019 0.17611851
8: c 2016 0.96894329
9: c 2017 0.97501190
10: c 2018 0.49732578
11: c 2019 0.90125133
12: k 2018 0.14836372
13: k 2019 0.01368339
14: f 2017 0.84735620
15: f 2018 0.71688780
16: f 2019 0.62894310
17: f 2020 0.73526859
I want to select only those groups who have the year
s from 2016 till 2019. And hence, my resulting data.table
would look like我只想 select 那些
year
从 2016 年到 2019 年的组。因此,我得到data.table
看起来像
group year values
1: b 2016 0.82078366
2: b 2017 0.92685531
3: b 2018 0.64406726
4: b 2019 0.17611851
5: c 2016 0.96894329
6: c 2017 0.97501190
7: c 2018 0.49732578
8: c 2019 0.90125133
The subsetting condition is that all years are present in the group.子集条件是所有年份都出现在组中。 We can construct a variable
V1
with that condition and select rows based on that by passing row indices through .I
.我们可以通过将行索引传递给
.I
来构造具有该条件的变量V1
和基于该条件的 select 行。
datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]
group year values
1: b 2016 0.86527048
2: b 2017 0.46478348
3: b 2018 0.94761731
4: b 2019 0.05005278
5: c 2016 0.73977484
6: c 2017 0.23698556
7: c 2018 0.29560906
8: c 2019 0.61450736
We could do:我们可以这样做:
library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]
group year values
1: b 2016 0.2321175
2: b 2017 0.2776979
3: b 2018 0.5695105
4: b 2019 0.7224908
5: c 2016 0.1904413
6: c 2017 0.4608467
7: c 2018 0.8258316
8: c 2019 0.7198854
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.