简体   繁体   English

根据多个分层字段和标准对数据进行子集

[英]Subsetting data based on multiple stratified fields and criteria

My data frame has multiple factors. 我的数据框有多个因素。 I would like to subset the data in a way that excludes only data that belongs to a specific factor level within another factor level. 我想以一种只排除属于另一个因子级别内特定因子级别的数据的方式对数据进行子集化。

I've used the two following approaches, but only one has worked - not sure why. 我使用了以下两种方法,但只有一种方法有效 - 不知道为什么。 Would appreciate if someone could explain it. 如果有人能够解释,我将不胜感激。

This is a simplified example, where f1 and f2 are the factors: 这是一个简化的例子,其中f1和f2是因素:

df = data.frame(f1 = c(rep(2019,4),rep(2018,4),rep(2017,4)), 
           f2 = rep(1:4,3), data = c(0:11))
print (df)

Output: 输出:

     f1 f2 data
1  2019  1    0
2  2019  2    1
3  2019  3    2
4  2019  4    3
5  2018  1    4
6  2018  2    5
7  2018  3    6
8  2018  4    7
9  2017  1    8
10 2017  2    9
11 2017  3   10
12 2017  4   11

In this case I want to keep only data that do not belong to level "1" of "factor 2" that are from "2019" in "factor 1". 在这种情况下,我希望仅保留不属于“因子2”的级别“1”的数据,这些数据来自“因子1”中的“2019”。

Method 1: 方法1:

subs.df = subset (df, f1 != 2019 & f2 != 1)
print (subs.df)
     f1 f2 data
6  2018  2    5
7  2018  3    6
8  2018  4    7
10 2017  2    9
11 2017  3   10
12 2017  4   11

Method 2: 方法2:

subs.df = subset (df, !(f1 %in% 2019 & f2 %in% 1))
print (subs.df)
     f1 f2 data
2  2019  2    1
3  2019  3    2
4  2019  4    3
5  2018  1    4
6  2018  2    5
7  2018  3    6
8  2018  4    7
9  2017  1    8
10 2017  2    9
11 2017  3   10
12 2017  4   11

WORKED! 成功了!

Why doesn't method 1 work but method 2 does? 为什么方法1不工作但方法2不工作? What are the differences? 有什么区别?

This is a logical issue, the negation of (A and B) is (not A) or (not B) 这是一个逻辑问题,(A和B)的否定是(不是A)或(不是B)

You just have to replace & by | 你只需要更换&| (or) (要么)

subs.df = subset (df, f1 != 2019 | f2 != 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM