简体   繁体   English

使用R中的dplyr / magrittr删除/删除列

[英]Delete/remove column with dplyr/magrittr in R

How can I remove a column with dplyr/magrittr in R? 如何在R中删除带有dplyr / magrittr的列?

Here I want to delete columns which have more than 50% NAs (this does not work of course): 在这里,我想删除超过50%NAs的列(当然这不起作用):

delNAcols <- function(x){ ifelse( mean(is.na(x))>0.5, NULL, x ) }
d <- data.frame(x=c(1,2,NA),y=c(NA,NA,4))
d %>% mutate_each(funs(delNAcols))

Solution

Both answers (from user3949008 and akrun) are good. 两个答案(来自user3949008和akrun)都很好。

If the processing is at the beginning of the low, one could use a combination of both answers, due to the best balance of shortness and magrittr-style, in other words, best readability: 如果处理处于低位的开始,则可以使用两个答案的组合,由于短期和magrittr风格的最佳平衡,换句话说,最佳可读性:

d %>%
  sapply(function(x) mean(is.na(x)) < 0.5) %>% 
  extract(d,. )

Because of the reuse of d this does not work if its later in the flow. 由于d的重用,如果它在后面的流程中不起作用。 Then user3949008's answer can be used after small change (and a small readability improvement): 然后user3949008的答案可以在小的改变之后使用(以及小的可读性改进):

d %>% select_(.dots = names(.)[which(sapply(., function(x) mean(is.na(x)) < 0.5))])

If one wants to have the whole thing even more concise, one can write 如果想让整个事情更简洁,那么就可以写出来

select_each <- function(df, fun) { df %>% sapply(fun) %>% extract(df,.) }
select_each <- function(df, fun) { df %>% select_(.dots = names(.)[which(sapply(., fun))]) }

d %>%
  select_each( function(x) mean(is.na(x)) < 0.5 )

with both select_each function being equal in functionality. 两个select_each函数在功能上相同。 However, I benchmarked them and the first one is three times as fast. 但是,我对它们进行了基准测试,第一个测试速度快了三倍。

We can use base R 我们可以使用base R

Filter(function(x) mean(is.na(x)) <= 0.5, d)

Or if we need to use the flow 或者如果我们需要使用flow

library(dplyr)
library(magrittr)
d %>%
   summarise_each(funs(mean(is.na(.)) <= 0.5)) %>% 
   unlist %>% 
   extract(d,. )

This is one way to do it - using select_ (because we will be supplying names to select as character vector): 这是一种方法 - 使用select_(因为我们将提供名称来选择作为字符向量):

library(dplyr)
d <- data.frame(x = c(1,2,NA), y = c(NA,NA,4), z = c(1,2,3), a = c(NA,NA,2), b = c(1,NA,2))
select_(d, .dots = names(d)[which(sapply(d, function(x) mean(is.na(x)) < 0.5))])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM