简体   繁体   English

在dplyr链的所有列中替换NA

[英]Replace NA in all columns of a dplyr chain

The question replace NA in a dplyr chain results into the solution 在dplyr链中替换NA的问题导致解决方案

dt %.% group_by(a) %.% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b))

with dplyr. 与dplyr。 I want to impute all colums with dplyr chain. 我想用dplyr链来估算所有colums。 There is no single column to group by, rather I want all numeric columns to have all NAs replaced by the means such as column means. 没有单个列可以分组,而是我希望所有数字列都通过诸如列均值之类的方式替换所有NAs。

What is the most elegant way to replace all NAs with column means with tidyverse/dp? 使用tidyverse / dp用列方法替换所有NA的最优雅方法是什么?

We can use mutate_all with ifelse 我们可以将mutate_allifelse mutate_all使用

dt %>%
   group_by(a) %>% 
   mutate_all(funs(ifelse(is.na(.), mean(., na.rm = TRUE), .)))

If we want a compact option, then use the na.aggregate from zoo which by default replace NA values with mean 如果我们想要一个紧凑的选项,那么使用zoona.aggregate ,它defaultmean替换NA

dt %>% 
   group_by(a) %>% 
   mutate_all(zoo::na.aggregate)

If we don't have a grouping variable, then remove the group_by and use mutate_if (just to be cautious about having some non-numeric column) 如果我们没有分组变量,那么删除group_by并使用mutate_if (只是为了谨慎使用一些非数字列)

dt %>%
   mutate_if(is.numeric, zoo::na.aggregate)

If all the columns are numeric, even 如果所有列都是数字,甚至是

zoo::na.aggregate(dt)

data 数据

set.seed(42)
dt <- data.frame(a = rep(letters[1:3], each = 3),
                 b= sample(c(NA, 1:5), 9, replace = TRUE), 
                 c = sample(c(NA, 1:3), 9, replace = TRUE))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM