[英]Filter rows with dplyr/magrittr based on entire row
One is able to filter rows with dplyr with filter
, but the condition is usually based on specific columns per row such as 一个可以使用dplyr和
filter
过滤行,但是条件通常基于每行的特定列,例如
d <- data.frame(x=c(1,2,NA),y=c(3,NA,NA),z=c(NA,4,5))
d %>% filter(!is.na(y))
I want to filter the row by whether the number of NA is greater than 50%, such as 我想通过NA的数量是否大于50%来过滤行,例如
d %>% filter(mean(is.na(EACHROW)) < 0.5 )
How do I do this in a dplyr/magrittr flow fashion? 如何以dplyr / magrittr流程方式执行此操作?
You could use rowSums
or rowMeans
for that. 您可以
rowSums
使用rowSums
或rowMeans
。 An example with the provided data: 提供的数据的示例:
> d
x y z
1 1 3 NA
2 2 NA 4
3 NA NA 5
# with rowSums:
d %>% filter(rowSums(is.na(.))/ncol(.) < 0.5)
# with rowMeans:
d %>% filter(rowMeans(is.na(.)) < 0.5)
which both give: 两者都给:
x y z
1 1 3 NA
2 2 NA 4
As you can see row 3 is removed from the data. 如您所见,第3行已从数据中删除。
In base R, you could just do: 在基数R中,您可以执行以下操作:
d[rowMeans(is.na(d)) < 0.5,]
to get the same result. 得到相同的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.