[英]R:Subsetting data frame by factor
assume we have the following data frame 假设我们有以下数据框
foo
k h=1 h=2 h=3
1 3 3 6 9
2 2 2 5 8
3 1 1 4 7
with 与
str(check)
'data.frame': 3 obs. of 4 variables:
$ k : Factor w/ 3 levels "3","2","1": 1 2 3
$ h=1: int 3 2 1
$ h=2: int 6 5 4
$ h=3: int 9 8 7
How can I subset my dataframe based on the factor of k
? 如何基于k
子集划分数据帧? For instance, to get only the row for k=3 or all rows k<3. 例如,仅获取k = 3的行或所有k <3的行。 I tried working with subet(foo, k=3)
but it doesn't work. 我尝试使用subet(foo, k=3)
但是它不起作用。 I also tried to convert the column k to numeric, but then my data.frame loses its order. 我也尝试将列k转换为数值,但是随后我的data.frame失去了顺序。 It's important that the data is of descending order with regard to k (so 3, 2, 1) 数据相对于k降序很重要(因此3、2、1)
Bracket notation should be able to subset on factors without any problems: 方括号表示法应该能够对所有因素进行子集化,而不会出现任何问题:
# Returns all rows of foo where k == '3'
foo[foo$k == '3',]
Two possible problems with what you did before: 您之前所做的两个可能的问题:
1) subset(foo, k=3)
should be subset(foo, k==3)
, don't confuse the equality operator ( ==
) with the assignment operator ( =
) 1) subset(foo, k=3)
应该是subset(foo, k==3)
,不要将相等运算符( ==
)与赋值运算符( =
)混淆
2) Since you're comparing with the actual level of your factor, you should check for equality with the character '3'
instead of the numeric 3
. 2)由于您正在与因子的实际水平进行比较,因此应检查字符'3'
而不是数字3
相等性。 You can see in the output from str()
that k's levels are "3","2","1"
, with quotes, whereas the integers for the other variables are shown without quotes 3 2 1
您可以从str()
的输出中看到,k的级别是带引号的"3","2","1"
,而其他变量的整数显示时没有引号3 2 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.