按多列过滤R data.frame

Question

I have an R data frame that contains many observations and looks like: 我有一个R数据框，其中包含许多观察结果，看起来像：

df <- data.frame(obs1=c(7.1,8.3,9.8), 
                 obs2=c(5.2,8.8,4.1), 
                 obs3=c(9.6,8.1,7.7), 
                 obs4=c(7.2,8.1,9.4), 
                 obs5=c(NA,5.4,9.0), 
                 hi1=c(9.6,8.8,9.8), 
                 hi2=c(7.2,8.3,9.4))

I simplified, as obs goes out to obs25. 我简化了，因为obs转到obs25。 hi1 and hi2 contain the highest and next highest values in each row. hi1和hi2在每一行中包含最高和次最高的值。 I need to get all the rows with obs* > x but less than hi1 or hi2. 我需要获取obs* > x但小于hi1或hi2的所有行。 In other words, all the rows that have values above a threshold but were not the 2 highest values. 换句话说，所有具有高于阈值但不是2个最高值的行。 Thanks! 谢谢！

Sorry for not being more clear. 抱歉，不清楚。 For example, if the threshold is set at 8 and the above dataframe is used the result would be rows 2 and 3: 例如，如果将阈值设置为8，并且使用了上述数据帧，则结果将是第2行和第3行：

in row 2, obs3 and obs4 are > 8, but less than 2 highest 在第2行中，obs3和obs4大于8，但最高小于2

in row 3, obs5 > 8, but less than 2 highest 在第3行中，obs5> 8，但最高不到2

Answer 1

Note that there are no rows meeting the criteria you seem to describe (in this example): 请注意，没有任何行符合您似乎要描述的条件（在此示例中）：

df <- data.frame(obs1=c(7.1,8.3,9.8), 
                 obs2=c(5.2,8.8,4.1), 
                 obs3=c(9.6,8.1,7.7), 
                 obs4=c(7.2,8.1,9.4), 
                 obs5=c(NA,5.4,9.0), 
                 hi1=c(9.6,8.8,9.8), 
                 hi2=c(7.2,8.3,9.4))

x <- 5

#rows which have a min value greater than x
df[which(apply(df[,-c(6:7)], 1, min) > x,),]

#rows which have a max value less than h2
df[which(apply(df[,-c(6:7)], 1, max) < df$h12,),]

#rows which have both
df[intersect(which(apply(df[,-c(6:7)], 1, min) > x,), which(apply(df[,-c(6:7)], 1, max) < df$h12,)),]

Answer 2

I want to explore one possibility here, and that's to find columns that comparatively are less than the two highest columns, each by their respective row element, and less than a threshold which is also a vector to compare element by element. 我想在这里探讨一种可能性，那就是找到相对小于前两个最高列的列，每个列分别比其各自的行元素小，并且小于阈值，该阈值也是逐元素比较的向量。 The below example adds a few observations so we can actually see some results come through: 下面的示例添加了一些观察结果，因此我们可以实际看到一些结果：

df <- data.frame(obs1=c(7.1,8.3,9.8), 
               obs2=c(5.2,8.8,4.1), 
               obs3=c(9.6,8.1,7.7), 
               obs4=c(7.2,8.1,9.4), 
               obs5=c(NA,5.4,9.0),
               obs6=c(6.6,7.3,8.8),
               obs7=c(1.1,6.7,9.0),
               obs8=c(8.8,8.4,9.6),
               obs9=c(6.0,7.8,8.3),
               hi1=c(9.6,8.8,9.8), 
               hi2=c(7.2,8.3,9.4))

x is our threshold, which is a vector x是我们的阈值，这是一个向量

x <- c(5.0,5.0,5.0)

now we apply on each column a comparison to the lowest of the two hi columns, combined with a comparison to the threshold. 现在，我们在每列上应用与两个hi列中最低列的比较，并与阈值进行比较。 The logical vector is then run as a product, so that a 1 is only reported if all elements are TRUE. 然后将逻辑向量作为乘积运行，以便仅在所有元素均为TRUE时报告1。 e is a logical vector of the columns we want to show. e是我们要显示的列的逻辑向量。

e <- as.logical(sapply(df, function(y) prod(ifelse(y < df$hi2 & y > x,TRUE,FALSE))>0))

subset our df by column 按列将df子集化

dfnew <- df[,which(e)]

So if I look at the end result: 因此，如果我看一下最终结果：

dfnew
  obs6 obs9
1  6.6  6.0
2  7.3  7.8
3  8.8  8.3

按多列过滤R data.frame

问题描述

2 个解决方案

解决方案1
1 2016-08-24 19:39:31

解决方案2
0 2016-08-24 20:52:29

按多列过滤R data.frame

问题描述

2 个解决方案

解决方案1 1 2016-08-24 19:39:31

解决方案2 0 2016-08-24 20:52:29

解决方案1
1 2016-08-24 19:39:31

解决方案2
0 2016-08-24 20:52:29