简体   繁体   English

使用 lapply 在数据框中填充 NA 值

[英]Using lapply to populate NA values in data frame

I want to populate the NA values in a data frame with the mean of the non-NA values in the column in which the NA values lie.我想用 NA 值所在列中非 NA 值的平均值填充数据框中的 NA 值。 For example, in the data frame ab below I want to replace all the NA in column b by (5+6+7)/3 = 6 because that is the average of all the non-NA values in column b.例如,在下面的数据框 ab 中,我想用 (5+6+7)/3 = 6 替换 b 列中的所有 NA,因为这是 b 列中所有非 NA 值的平均值。 I want to do the same for all the other columns.我想对所有其他列做同样的事情。

ab<-data.frame(a=c(1,2,3,4),b=c(NA,5,6,7),c=c(4,NA,5,6),d=c(3,NA,NA,5))

  a  b  c  d
1 1 NA  4  3
2 2  5 NA NA
3 3  6  5 NA
4 4  7  6  5

I wrote the below to do this.我写了下面的内容来做到这一点。

lapply(ab,function(b){lapply(b,function(c){c=ifelse (is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})})

The result is结果是

$a
$a[[1]]
[1] 1

$a[[2]]
[1] 2

$a[[3]]
[1] 3

$a[[4]]
[1] 4


$b
$b[[1]]
[1] 6

$b[[2]]
[1] 5

$b[[3]]
[1] 6

$b[[4]]
[1] 7


$c
$c[[1]]
[1] 4

$c[[2]]
[1] 5

$c[[3]]
[1] 5

$c[[4]]
[1] 6


$d
$d[[1]]
[1] 3

$d[[2]]
[1] 4

$d[[3]]
[1] 4

$d[[4]]
[1] 5

instead of代替

  a  b  c  d
1 1  6  4  3
2 2  5  5  4
3 3  6  5  4
4 4  7  6  5

If I do如果我做

as.data.frame(lapply(ab,function(b){lapply(b,function(c){c=ifelse (is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})})) 

hoping to cast the result of lapply into a data frame, I get希望将 lapply 的结果转换为数据框,我得到

  a.1 a.2 a.3 a.4 b.6 b.5 b.6.1 b.7 c.4 c.5 c.5.1 c.6 d.3 d.4 d.4.1 d.5
1   1   2   3   4   6   5     6   7   4   5     5   6   3   4     4   5

What does this mean?这是什么意思? How do I get the desired result?我如何得到想要的结果? I do see that the R output is another way of representing the desired result but I want the expected, conventional appearance of a data frame for the output.我确实看到 R 输出是表示所需结果的另一种方式,但我希望输出的数据框具有预期的常规外观。

Use na.aggregate from zoo使用zoo na.aggregate

library(zoo)
library(dplyr)
ab %>% 
     mutate(across(everything(), na.aggregate))

-output -输出

  a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5

Also, by default na.aggregate does the columnwise replacement of NA with the mean of those corresponding columns.此外,默认情况下na.aggregate使用那些相应列的mean对 NA 进na.aggregate替换。 So, it can be more compact as所以,它可以更紧凑

na.aggregate(ab)
  a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5

You can also use map_if function from purrr package:您还可以使用purrr包中的map_if函数:

library(dplyr)
library(purrr)

ab %>%
  map_if(~ any(is.na(.x)), ~ replace(.x, is.na(.x), mean(.x, na.rm = TRUE))) %>%
  bind_cols()

# A tibble: 4 x 4
      a     b     c     d
  <dbl> <dbl> <dbl> <dbl>
1     1     6     4     3
2     2     5     5     4
3     3     6     5     4
4     4     7     6     5

We can also replace replace with coalesce :我们也可以用coalesce替换replace

ab %>%
  map_if(~ any(is.na(.x)), ~ coalesce(.x, mean(.x, na.rm = TRUE))) %>%
  bind_cols()
k<-sapply(ab,function(b){lapply(b,function(c){c=ifelse(is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})})
ans<-as.data.frame(k,nrow=4,ncol=4)

gives

  a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5

You can use lapply -您可以使用lapply -

ab[] <- lapply(ab, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))

#  a b c d
#1 1 6 4 3
#2 2 5 5 4
#3 3 6 5 4
#4 4 7 6 5

Or with dplyr -或者使用dplyr -

library(dplyr)

ab %>% mutate(across(.fns = ~replace(., is.na(.), mean(., na.rm = TRUE))))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM