使用R中的dplyr / magrittr删除/删除列

Question

How can I remove a column with dplyr/magrittr in R? 如何在R中删除带有dplyr / magrittr的列？

Here I want to delete columns which have more than 50% NAs (this does not work of course): 在这里，我想删除超过50％NAs的列（当然这不起作用）：

delNAcols <- function(x){ ifelse( mean(is.na(x))>0.5, NULL, x ) }
d <- data.frame(x=c(1,2,NA),y=c(NA,NA,4))
d %>% mutate_each(funs(delNAcols))

Solution 解

Both answers (from user3949008 and akrun) are good. 两个答案（来自user3949008和akrun）都很好。

If the processing is at the beginning of the low, one could use a combination of both answers, due to the best balance of shortness and magrittr-style, in other words, best readability: 如果处理处于低位的开始，则可以使用两个答案的组合，由于短期和magrittr风格的最佳平衡，换句话说，最佳可读性：

d %>%
  sapply(function(x) mean(is.na(x)) < 0.5) %>% 
  extract(d,. )

Because of the reuse of d this does not work if its later in the flow. 由于d的重用，如果它在后面的流程中不起作用。 Then user3949008's answer can be used after small change (and a small readability improvement): 然后user3949008的答案可以在小的改变之后使用（以及小的可读性改进）：

d %>% select_(.dots = names(.)[which(sapply(., function(x) mean(is.na(x)) < 0.5))])

If one wants to have the whole thing even more concise, one can write 如果想让整个事情更简洁，那么就可以写出来

select_each <- function(df, fun) { df %>% sapply(fun) %>% extract(df,.) }
select_each <- function(df, fun) { df %>% select_(.dots = names(.)[which(sapply(., fun))]) }

d %>%
  select_each( function(x) mean(is.na(x)) < 0.5 )

with both select_each function being equal in functionality. 两个select_each函数在功能上相同。 However, I benchmarked them and the first one is three times as fast. 但是，我对它们进行了基准测试，第一个测试速度快了三倍。

Answer 1

We can use base R 我们可以使用base R

Filter(function(x) mean(is.na(x)) <= 0.5, d)

Or if we need to use the flow 或者如果我们需要使用flow

library(dplyr)
library(magrittr)
d %>%
   summarise_each(funs(mean(is.na(.)) <= 0.5)) %>% 
   unlist %>% 
   extract(d,. )

Answer 2

This is one way to do it - using select_ (because we will be supplying names to select as character vector): 这是一种方法 - 使用select_（因为我们将提供名称来选择作为字符向量）：

library(dplyr)
d <- data.frame(x = c(1,2,NA), y = c(NA,NA,4), z = c(1,2,3), a = c(NA,NA,2), b = c(1,NA,2))
select_(d, .dots = names(d)[which(sapply(d, function(x) mean(is.na(x)) < 0.5))])

使用R中的dplyr / magrittr删除/删除列

问题描述

Solution 解

2 个解决方案

解决方案1
6 已采纳 2016-01-05 13:30:10

解决方案2
4 2016-01-05 13:45:56

使用R中的dplyr / magrittr删除/删除列

问题描述

Solution 解

2 个解决方案

解决方案1 6 已采纳 2016-01-05 13:30:10

解决方案2 4 2016-01-05 13:45:56

解决方案1
6 已采纳 2016-01-05 13:30:10

解决方案2
4 2016-01-05 13:45:56