[英]How to fill mean for NAs in column by groups in r?
I have a dataset with several NAs I want to take mean for each column and fill Nas by specific groups my dataset looks as below我有一个包含多个 NA 的数据集,我想为每列取平均值并按特定组填充 Nas 我的数据集如下所示
PID Category column1 column2 column3
123 1 54 2.4 NA
324 1 52 NA 21.1
356 1 NA 3.6 25.6
378 2 56 3.2 NA
395 2 NA 3.5 29.9
362 2 45 NA 24.3
789 3 65 12.6 23.8
759 3 66 NA 26.8
762 3 NA NA 27.2
741 3 69 8.5 23.3
I need desired output我需要想要的 output
PID Category column1 column2 column3
123 1 54 2.4 23.3
324 1 52 3.0 21.1
356 1 53 3.6 25.6
378 2 56 3.2 27.1
395 2 50.5 3.5 29.9
362 2 61.3 3.3 24.3
789 3 65 12.6 23.8
759 3 66 10.5 26.8
762 3 66.6 10.5 27.2
741 3 69 8.5 23.3
Thanks谢谢
You can use:您可以使用:
library(dplyr)
df %>%
group_by(Category) %>%
mutate(across(starts_with('column'),
~replace(., is.na(.), mean(., na.rm = TRUE)))) %>%
ungroup
# PID Category column1 column2 column3
# <int> <int> <dbl> <dbl> <dbl>
# 1 123 1 54 2.4 23.4
# 2 324 1 52 3 21.1
# 3 356 1 53 3.6 25.6
# 4 378 2 56 3.2 27.1
# 5 395 2 50.5 3.5 29.9
# 6 362 2 45 3.35 24.3
# 7 789 3 65 12.6 23.8
# 8 759 3 66 10.6 26.8
# 9 762 3 66.7 10.6 27.2
#10 741 3 69 8.5 23.3
We can use na.aggregate
from zoo
and by default, it replaces the NA
with mean
of the column concerned我们可以使用
zoo
中的na.aggregate
,默认情况下,它将NA
替换为相关列的mean
library(dplyr)
library(zoo)
df1 %>%
group_by(Category) %>%
mutate(across(starts_with('column'), na.aggregate)) %>%
ungroup
Or use group_modify
with na.aggregate
as @G.或者使用
group_modify
和na.aggregate
作为@G。 Grothendieck suggested in the comments格洛腾迪克在评论中建议
df1 %>%
group_by(Category) %>%
group_modify(na.aggregate) %>%
ungroup
Or using data.table
或使用
data.table
library(data.table)
nm1 <- grep("^column\\d+$", names(df1), value = TRUE)
setDT(df1)[, (nm1) := na.aggregate(.SD), by = Category, .SDcols = nm1]
Or with base R
或与
base R
unsplit(lapply(split(df1, df1$Category), na.aggregate), df1$Category)
Another data.table
option另一个
data.table
选项
cbind(
setDT(df)[, "PID"],
df[,
lapply(
.SD,
function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
), Category,
.SDcols = patterns("^column")
]
)
gives给
PID Category column1 column2 column3
1: 123 1 54.00000 2.40 23.35
2: 324 1 52.00000 3.00 21.10
3: 356 1 53.00000 3.60 25.60
4: 378 2 56.00000 3.20 27.10
5: 395 2 50.50000 3.50 29.90
6: 362 2 45.00000 3.35 24.30
7: 789 3 65.00000 12.60 23.80
8: 759 3 66.00000 10.55 26.80
9: 762 3 66.66667 10.55 27.20
10: 741 3 69.00000 8.50 23.30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.