根据整行使用dplyr / magrittr过滤行

Question

One is able to filter rows with dplyr with filter , but the condition is usually based on specific columns per row such as 一个可以使用dplyr和filter过滤行，但是条件通常基于每行的特定列，例如

d <- data.frame(x=c(1,2,NA),y=c(3,NA,NA),z=c(NA,4,5))
d %>% filter(!is.na(y))

I want to filter the row by whether the number of NA is greater than 50%, such as 我想通过NA的数量是否大于50％来过滤行，例如

d %>% filter(mean(is.na(EACHROW)) < 0.5 )

How do I do this in a dplyr/magrittr flow fashion? 如何以dplyr / magrittr流程方式执行此操作？

Answer 1

You could use rowSums or rowMeans for that. 您可以rowSums使用rowSums或rowMeans 。 An example with the provided data: 提供的数据的示例：

> d
   x  y  z
1  1  3 NA
2  2 NA  4
3 NA NA  5

# with rowSums:
d %>% filter(rowSums(is.na(.))/ncol(.) < 0.5)

# with rowMeans:
d %>% filter(rowMeans(is.na(.)) < 0.5)

which both give: 两者都给：

  x  y  z
1 1  3 NA
2 2 NA  4

As you can see row 3 is removed from the data. 如您所见，第3行已从数据中删除。

In base R, you could just do: 在基数R中，您可以执行以下操作：

d[rowMeans(is.na(d)) < 0.5,]

to get the same result. 得到相同的结果。

根据整行使用dplyr / magrittr过滤行

问题描述

1 个解决方案

解决方案1
7 已采纳 2016-01-07 10:05:27

根据整行使用dplyr / magrittr过滤行

问题描述

1 个解决方案

解决方案1 7 已采纳 2016-01-07 10:05:27

解决方案1
7 已采纳 2016-01-07 10:05:27