简体   繁体   English

Select 某些年份可用的那些组

[英]Select those groups which are available for certain years

I have a data.table as following:-我有一个data.table如下:-

datazzz <- data.table(group = c(rep("a", times = 3),
                                rep("b", times = 4),
                                rep("c", times = 4),
                                rep("k", times = 2),
                                rep("f", times = 4)),
                      year = c(2017:2019, 2016:2019, 2016:2019, 2018, 2019,
                               2017:2020),
                      values = runif(17))

datazzz

    group year     values
 1:     a 2017 0.14710475
 2:     a 2018 0.23493958
 3:     a 2019 0.97570157
 4:     b 2016 0.82078366
 5:     b 2017 0.92685531
 6:     b 2018 0.64406726
 7:     b 2019 0.17611851
 8:     c 2016 0.96894329
 9:     c 2017 0.97501190
10:     c 2018 0.49732578
11:     c 2019 0.90125133
12:     k 2018 0.14836372
13:     k 2019 0.01368339
14:     f 2017 0.84735620
15:     f 2018 0.71688780
16:     f 2019 0.62894310
17:     f 2020 0.73526859

I want to select only those groups who have the year s from 2016 till 2019. And hence, my resulting data.table would look like我只想 select 那些year从 2016 年到 2019 年的组。因此,我得到data.table看起来像

    group year     values
 1:     b 2016 0.82078366
 2:     b 2017 0.92685531
 3:     b 2018 0.64406726
 4:     b 2019 0.17611851
 5:     c 2016 0.96894329
 6:     c 2017 0.97501190
 7:     c 2018 0.49732578
 8:     c 2019 0.90125133

The subsetting condition is that all years are present in the group.子集条件是所有年份都出现在组中。 We can construct a variable V1 with that condition and select rows based on that by passing row indices through .I .我们可以通过将行索引传递给.I来构造具有该条件的变量V1和基于该条件的 select 行。

datazzz[datazzz[, .I[all(2016:2019 %in% unique(year))], by = .(group)]$V1]

   group year     values
1:     b 2016 0.86527048
2:     b 2017 0.46478348
3:     b 2018 0.94761731
4:     b 2019 0.05005278
5:     c 2016 0.73977484
6:     c 2017 0.23698556
7:     c 2018 0.29560906
8:     c 2019 0.61450736

We could do:我们可以这样做:

library(data.table)
setDT(datazzz)[, if(min(year) == 2016 & max(year)==2019) .SD, by = group]


   group year    values
1:     b 2016 0.2321175
2:     b 2017 0.2776979
3:     b 2018 0.5695105
4:     b 2019 0.7224908
5:     c 2016 0.1904413
6:     c 2017 0.4608467
7:     c 2018 0.8258316
8:     c 2019 0.7198854

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM