简体   繁体   English

R:按因子细分数据

[英]R:Subsetting data frame by factor

assume we have the following data frame 假设我们有以下数据框

foo
  k h=1 h=2 h=3
1 3   3   6   9
2 2   2   5   8
3 1   1   4   7

with

str(check)
'data.frame':   3 obs. of  4 variables:
 $ k  : Factor w/ 3 levels "3","2","1": 1 2 3
 $ h=1: int  3 2 1
 $ h=2: int  6 5 4
 $ h=3: int  9 8 7

How can I subset my dataframe based on the factor of k ? 如何基于k子集划分数据帧? For instance, to get only the row for k=3 or all rows k<3. 例如,仅获取k = 3的行或所有k <3的行。 I tried working with subet(foo, k=3) but it doesn't work. 我尝试使用subet(foo, k=3)但是它不起作用。 I also tried to convert the column k to numeric, but then my data.frame loses its order. 我也尝试将列k转换为数值,但是随后我的data.frame失去了顺序。 It's important that the data is of descending order with regard to k (so 3, 2, 1) 数据相对于k降序很重要(因此3、2、1)

Bracket notation should be able to subset on factors without any problems: 方括号表示法应该能够对所有因素进行子集化,而不会出现任何问题:

# Returns all rows of foo where k == '3'
foo[foo$k == '3',]

Two possible problems with what you did before: 您之前所做的两个可能的问题:

1) subset(foo, k=3) should be subset(foo, k==3) , don't confuse the equality operator ( == ) with the assignment operator ( = ) 1) subset(foo, k=3)应该是subset(foo, k==3) ,不要将相等运算符( == )与赋值运算符( = )混淆

2) Since you're comparing with the actual level of your factor, you should check for equality with the character '3' instead of the numeric 3 . 2)由于您正在与因子的实际水平进行比较,因此应检查字符'3'而不是数字3相等性。 You can see in the output from str() that k's levels are "3","2","1" , with quotes, whereas the integers for the other variables are shown without quotes 3 2 1 您可以从str()的输出中看到,k的级别是带引号的"3","2","1" ,而其他变量的整数显示时没有引号3 2 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM