[英]Using lapply to populate NA values in data frame
I want to populate the NA values in a data frame with the mean of the non-NA values in the column in which the NA values lie.我想用 NA 值所在列中非 NA 值的平均值填充数据框中的 NA 值。 For example, in the data frame ab below I want to replace all the NA in column b by (5+6+7)/3 = 6 because that is the average of all the non-NA values in column b.
例如,在下面的数据框 ab 中,我想用 (5+6+7)/3 = 6 替换 b 列中的所有 NA,因为这是 b 列中所有非 NA 值的平均值。 I want to do the same for all the other columns.
我想对所有其他列做同样的事情。
ab<-data.frame(a=c(1,2,3,4),b=c(NA,5,6,7),c=c(4,NA,5,6),d=c(3,NA,NA,5))
a b c d
1 1 NA 4 3
2 2 5 NA NA
3 3 6 5 NA
4 4 7 6 5
I wrote the below to do this.我写了下面的内容来做到这一点。
lapply(ab,function(b){lapply(b,function(c){c=ifelse (is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})})
The result is结果是
$a
$a[[1]]
[1] 1
$a[[2]]
[1] 2
$a[[3]]
[1] 3
$a[[4]]
[1] 4
$b
$b[[1]]
[1] 6
$b[[2]]
[1] 5
$b[[3]]
[1] 6
$b[[4]]
[1] 7
$c
$c[[1]]
[1] 4
$c[[2]]
[1] 5
$c[[3]]
[1] 5
$c[[4]]
[1] 6
$d
$d[[1]]
[1] 3
$d[[2]]
[1] 4
$d[[3]]
[1] 4
$d[[4]]
[1] 5
instead of代替
a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5
If I do如果我做
as.data.frame(lapply(ab,function(b){lapply(b,function(c){c=ifelse (is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})}))
hoping to cast the result of lapply into a data frame, I get希望将 lapply 的结果转换为数据框,我得到
a.1 a.2 a.3 a.4 b.6 b.5 b.6.1 b.7 c.4 c.5 c.5.1 c.6 d.3 d.4 d.4.1 d.5
1 1 2 3 4 6 5 6 7 4 5 5 6 3 4 4 5
What does this mean?这是什么意思? How do I get the desired result?
我如何得到想要的结果? I do see that the R output is another way of representing the desired result but I want the expected, conventional appearance of a data frame for the output.
我确实看到 R 输出是表示所需结果的另一种方式,但我希望输出的数据框具有预期的常规外观。
Use na.aggregate
from zoo
使用
zoo
na.aggregate
library(zoo)
library(dplyr)
ab %>%
mutate(across(everything(), na.aggregate))
-output -输出
a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5
Also, by default na.aggregate
does the columnwise replacement of NA with the mean
of those corresponding columns.此外,默认情况下
na.aggregate
使用那些相应列的mean
对 NA 进na.aggregate
替换。 So, it can be more compact as所以,它可以更紧凑
na.aggregate(ab)
a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5
You can also use map_if
function from purrr
package:您还可以使用
purrr
包中的map_if
函数:
library(dplyr)
library(purrr)
ab %>%
map_if(~ any(is.na(.x)), ~ replace(.x, is.na(.x), mean(.x, na.rm = TRUE))) %>%
bind_cols()
# A tibble: 4 x 4
a b c d
<dbl> <dbl> <dbl> <dbl>
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5
We can also replace replace
with coalesce
:我们也可以用
coalesce
替换replace
:
ab %>%
map_if(~ any(is.na(.x)), ~ coalesce(.x, mean(.x, na.rm = TRUE))) %>%
bind_cols()
k<-sapply(ab,function(b){lapply(b,function(c){c=ifelse(is.na(c)==TRUE,mean(b,na.rm=TRUE),c)})})
ans<-as.data.frame(k,nrow=4,ncol=4)
gives给
a b c d
1 1 6 4 3
2 2 5 5 4
3 3 6 5 4
4 4 7 6 5
You can use lapply
-您可以使用
lapply
-
ab[] <- lapply(ab, function(x) replace(x, is.na(x), mean(x, na.rm = TRUE)))
# a b c d
#1 1 6 4 3
#2 2 5 5 4
#3 3 6 5 4
#4 4 7 6 5
Or with dplyr
-或者使用
dplyr
-
library(dplyr)
ab %>% mutate(across(.fns = ~replace(., is.na(.), mean(., na.rm = TRUE))))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.