[英]Filter R data.frame by multiple columns
I have an R data frame that contains many observations and looks like: 我有一个R数据框,其中包含许多观察结果,看起来像:
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
I simplified, as obs goes out to obs25. 我简化了,因为obs转到obs25。 hi1 and hi2 contain the highest and next highest values in each row.
hi1和hi2在每一行中包含最高和次最高的值。 I need to get all the rows with
obs* > x
but less than hi1 or hi2. 我需要获取
obs* > x
但小于hi1或hi2的所有行。 In other words, all the rows that have values above a threshold but were not the 2 highest values. 换句话说,所有具有高于阈值但不是2个最高值的行。 Thanks!
谢谢!
Sorry for not being more clear. 抱歉,不清楚。 For example, if the threshold is set at 8 and the above dataframe is used the result would be rows 2 and 3:
例如,如果将阈值设置为8,并且使用了上述数据帧,则结果将是第2行和第3行:
in row 2, obs3 and obs4 are > 8, but less than 2 highest 在第2行中,obs3和obs4大于8,但最高小于2
in row 3, obs5 > 8, but less than 2 highest 在第3行中,obs5> 8,但最高不到2
Note that there are no rows meeting the criteria you seem to describe (in this example): 请注意,没有任何行符合您似乎要描述的条件(在此示例中):
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
x <- 5
#rows which have a min value greater than x
df[which(apply(df[,-c(6:7)], 1, min) > x,),]
#rows which have a max value less than h2
df[which(apply(df[,-c(6:7)], 1, max) < df$h12,),]
#rows which have both
df[intersect(which(apply(df[,-c(6:7)], 1, min) > x,), which(apply(df[,-c(6:7)], 1, max) < df$h12,)),]
I want to explore one possibility here, and that's to find columns that comparatively are less than the two highest columns, each by their respective row element, and less than a threshold which is also a vector to compare element by element. 我想在这里探讨一种可能性,那就是找到相对小于前两个最高列的列,每个列分别比其各自的行元素小,并且小于阈值,该阈值也是逐元素比较的向量。 The below example adds a few observations so we can actually see some results come through:
下面的示例添加了一些观察结果,因此我们可以实际看到一些结果:
df <- data.frame(obs1=c(7.1,8.3,9.8),
obs2=c(5.2,8.8,4.1),
obs3=c(9.6,8.1,7.7),
obs4=c(7.2,8.1,9.4),
obs5=c(NA,5.4,9.0),
obs6=c(6.6,7.3,8.8),
obs7=c(1.1,6.7,9.0),
obs8=c(8.8,8.4,9.6),
obs9=c(6.0,7.8,8.3),
hi1=c(9.6,8.8,9.8),
hi2=c(7.2,8.3,9.4))
x is our threshold, which is a vector
x是我们的阈值,这是一个向量
x <- c(5.0,5.0,5.0)
now we apply on each column a comparison to the lowest of the two hi columns, combined with a comparison to the threshold.
现在,我们在每列上应用与两个hi列中最低列的比较,并与阈值进行比较。 The logical vector is then run as a product, so that a 1 is only reported if all elements are TRUE.
然后将逻辑向量作为乘积运行,以便仅在所有元素均为TRUE时报告1。 e is a logical vector of the columns we want to show.
e是我们要显示的列的逻辑向量。
e <- as.logical(sapply(df, function(y) prod(ifelse(y < df$hi2 & y > x,TRUE,FALSE))>0))
subset our df by column
按列将df子集化
dfnew <- df[,which(e)]
So if I look at the end result:
因此,如果我看一下最终结果:
dfnew
obs6 obs9
1 6.6 6.0
2 7.3 7.8
3 8.8 8.3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.