简体   繁体   English

按多列过滤R data.frame

[英]Filter R data.frame by multiple columns

I have an R data frame that contains many observations and looks like: 我有一个R数据框,其中包含许多观察结果,看起来像:

df <- data.frame(obs1=c(7.1,8.3,9.8), 
                 obs2=c(5.2,8.8,4.1), 
                 obs3=c(9.6,8.1,7.7), 
                 obs4=c(7.2,8.1,9.4), 
                 obs5=c(NA,5.4,9.0), 
                 hi1=c(9.6,8.8,9.8), 
                 hi2=c(7.2,8.3,9.4))

I simplified, as obs goes out to obs25. 我简化了,因为obs转到obs25。 hi1 and hi2 contain the highest and next highest values in each row. hi1和hi2在每一行中包含最高和次最高的值。 I need to get all the rows with obs* > x but less than hi1 or hi2. 我需要获取obs* > x但小于hi1或hi2的所有行。 In other words, all the rows that have values above a threshold but were not the 2 highest values. 换句话说,所有具有高于阈值但不是2个最高值的行。 Thanks! 谢谢!

Sorry for not being more clear. 抱歉,不清楚。 For example, if the threshold is set at 8 and the above dataframe is used the result would be rows 2 and 3: 例如,如果将阈值设置为8,并且使用了上述数据帧,则结果将是第2行和第3行:

in row 2, obs3 and obs4 are > 8, but less than 2 highest 在第2行中,obs3和obs4大于8,但最高小于2

in row 3, obs5 > 8, but less than 2 highest 在第3行中,obs5> 8,但最高不到2

Note that there are no rows meeting the criteria you seem to describe (in this example): 请注意,没有任何行符合您似乎要描述的条件(在此示例中):

df <- data.frame(obs1=c(7.1,8.3,9.8), 
                 obs2=c(5.2,8.8,4.1), 
                 obs3=c(9.6,8.1,7.7), 
                 obs4=c(7.2,8.1,9.4), 
                 obs5=c(NA,5.4,9.0), 
                 hi1=c(9.6,8.8,9.8), 
                 hi2=c(7.2,8.3,9.4))

x <- 5

#rows which have a min value greater than x
df[which(apply(df[,-c(6:7)], 1, min) > x,),]

#rows which have a max value less than h2
df[which(apply(df[,-c(6:7)], 1, max) < df$h12,),]

#rows which have both
df[intersect(which(apply(df[,-c(6:7)], 1, min) > x,), which(apply(df[,-c(6:7)], 1, max) < df$h12,)),]

I want to explore one possibility here, and that's to find columns that comparatively are less than the two highest columns, each by their respective row element, and less than a threshold which is also a vector to compare element by element. 我想在这里探讨一种可能性,那就是找到相对小于前两个最高列的列,每个列分别比其各自的行元素小,并且小于阈值,该阈值也是逐元素比较的向量。 The below example adds a few observations so we can actually see some results come through: 下面的示例添加了一些观察结果,因此我们可以实际看到一些结果:

df <- data.frame(obs1=c(7.1,8.3,9.8), 
               obs2=c(5.2,8.8,4.1), 
               obs3=c(9.6,8.1,7.7), 
               obs4=c(7.2,8.1,9.4), 
               obs5=c(NA,5.4,9.0),
               obs6=c(6.6,7.3,8.8),
               obs7=c(1.1,6.7,9.0),
               obs8=c(8.8,8.4,9.6),
               obs9=c(6.0,7.8,8.3),
               hi1=c(9.6,8.8,9.8), 
               hi2=c(7.2,8.3,9.4))

x is our threshold, which is a vector x是我们的阈值,这是一个向量

x <- c(5.0,5.0,5.0)

now we apply on each column a comparison to the lowest of the two hi columns, combined with a comparison to the threshold. 现在,我们在每列上应用与两个hi列中最低列的比较,并与阈值进行比较。 The logical vector is then run as a product, so that a 1 is only reported if all elements are TRUE. 然后将逻辑向量作为乘积运行,以便仅在所有元素均为TRUE时报告1。 e is a logical vector of the columns we want to show. e是我们要显示的列的逻辑向量。

e <- as.logical(sapply(df, function(y) prod(ifelse(y < df$hi2 & y > x,TRUE,FALSE))>0))

subset our df by column 按列将df子集化

dfnew <- df[,which(e)]

So if I look at the end result: 因此,如果我看一下最终结果:

dfnew
  obs6 obs9
1  6.6  6.0
2  7.3  7.8
3  8.8  8.3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM