[英]Subset dataframe such that all values in each row are less than a certain value
I have a dataframe with a dimension column and 4 value columns. 我有一个带有维列和4个值列的数据框。 How can I subset the column such that all 4 columns for each record are less than a given x? 我如何子集该列,以便每个记录的所有4列都小于给定的x? I know I could do this manually using subset and specifying the condition for each column, but is there a way to do it using maybe an apply function? 我知道我可以使用子集并为每列指定条件来手动执行此操作,但是是否可以使用Apply函数来执行此操作? Below is a sample dataframe. 下面是一个示例数据框。 For example let's say the x was 0.7. 例如,假设x为0.7。 In that case I would want to eliminate any rows where any column of that row is more than 0.7). 在那种情况下,我想消除该行中任何列大于0.7的任何行。
zips ABC DEF GHI JKL
1 1 0.8 0.6 0.1 0.6
2 2 0.1 0.3 0.8 1.0
3 3 0.5 0.1 0.4 0.8
4 4 0.6 0.4 0.2 0.3
5 5 1.0 0.8 0.6 0.5
6 6 0.2 0.7 0.3 0.4
7 7 0.3 1.0 1.0 0.2
8 8 0.7 0.9 0.5 0.1
9 9 0.9 0.5 0.9 0.7
10 10 0.4 0.2 0.7 0.9
The following function seemed to work, but could someone explain the logic here? 以下功能似乎有效,但是有人可以在这里解释逻辑吗?
Variance_Percentile[!rowSums(Variance_Percentile[-1] > 0.7), ]
zips ABC DEF GHI JKL
4 4 0.6 0.4 0.2 0.3
6 6 0.2 0.7 0.3 0.4
You can use the negated rowSums()
for the subset 您可以rowSums()
集使用否定的rowSums()
df[!rowSums(df[-1] > 0.7), ]
# zips ABC DEF GHI JKL
# 4 4 0.6 0.4 0.2 0.3
# 6 6 0.2 0.7 0.3 0.4
df[-1] > 0.7
gives us a logical matrix telling us which df[-1]
are greater than 0.7 df[-1] > 0.7
给我们一个逻辑矩阵,告诉我们哪个df[-1]
大于0.7 rowSums()
sums across those rows (each TRUE value is equal to 1, FALSE is zero) rowSums()
对这些行进行求和(每个TRUE值等于1,FALSE为零) !
converts those values to logical and negates them, so that we get any row sums which are zero (FALSE) and turn them into TRUE. 将这些值转换为逻辑值并将它们取反,这样我们就可以得到任何零的行总和(FALSE),并将它们转换为TRUE。 In other words, if the rowSums()
result is zero, we want those rows. 换句话说,如果rowSums()
结果为零,则需要这些行。 Another way to get the same logical vector would be to do 获得相同逻辑向量的另一种方法是
rowSums(df[-1] > 0.7) == 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.