简体   繁体   English

如何在 r 中按组填充 NA 的平均值?

[英]How to fill mean for NAs in column by groups in r?

I have a dataset with several NAs I want to take mean for each column and fill Nas by specific groups my dataset looks as below我有一个包含多个 NA 的数据集,我想为每列取平均值并按特定组填充 Nas 我的数据集如下所示

PID Category    column1 column2 column3
123    1             54    2.4  NA
324    1             52    NA   21.1
356    1             NA    3.6  25.6
378    2             56    3.2  NA
395    2             NA    3.5  29.9
362    2             45    NA   24.3
789    3             65   12.6  23.8
759    3             66    NA   26.8
762    3             NA    NA   27.2
741    3             69   8.5   23.3

I need desired output我需要想要的 output

PID Category    column1 column2 column3
123    1             54   2.4   23.3
324    1             52   3.0   21.1
356    1             53   3.6   25.6
378    2             56   3.2   27.1
395    2             50.5 3.5   29.9
362    2             61.3 3.3   24.3
789    3             65   12.6  23.8
759    3             66   10.5  26.8
762    3             66.6 10.5  27.2
741    3             69   8.5   23.3

Thanks谢谢

You can use:您可以使用:

library(dplyr)

df %>%
  group_by(Category) %>%
  mutate(across(starts_with('column'), 
                ~replace(., is.na(.), mean(., na.rm = TRUE)))) %>%
  ungroup

#     PID Category column1 column2 column3
#   <int>    <int>   <dbl>   <dbl>   <dbl>
# 1   123        1    54      2.4     23.4
# 2   324        1    52      3       21.1
# 3   356        1    53      3.6     25.6
# 4   378        2    56      3.2     27.1
# 5   395        2    50.5    3.5     29.9
# 6   362        2    45      3.35    24.3
# 7   789        3    65     12.6     23.8
# 8   759        3    66     10.6     26.8
# 9   762        3    66.7   10.6     27.2
#10   741        3    69      8.5     23.3

We can use na.aggregate from zoo and by default, it replaces the NA with mean of the column concerned我们可以使用zoo中的na.aggregate ,默认情况下,它将NA替换为相关列的mean

library(dplyr)
library(zoo)
df1 %>%
   group_by(Category) %>%
   mutate(across(starts_with('column'), na.aggregate)) %>%
   ungroup

Or use group_modify with na.aggregate as @G.或者使用group_modifyna.aggregate作为@G。 Grothendieck suggested in the comments格洛腾迪克在评论中建议

df1 %>% 
  group_by(Category) %>% 
  group_modify(na.aggregate) %>%
  ungroup

Or using data.table或使用data.table

library(data.table)
nm1 <- grep("^column\\d+$", names(df1), value = TRUE)
setDT(df1)[, (nm1) := na.aggregate(.SD), by = Category, .SDcols = nm1]

Or with base R或与base R

unsplit(lapply(split(df1, df1$Category), na.aggregate), df1$Category)

Another data.table option另一个data.table选项

cbind(
  setDT(df)[, "PID"],
  df[,
    lapply(
      .SD,
      function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
    ), Category,
    .SDcols = patterns("^column")
  ]
)

gives

   PID Category  column1 column2 column3
 1: 123        1 54.00000    2.40   23.35
 2: 324        1 52.00000    3.00   21.10
 3: 356        1 53.00000    3.60   25.60
 4: 378        2 56.00000    3.20   27.10
 5: 395        2 50.50000    3.50   29.90
 6: 362        2 45.00000    3.35   24.30
 7: 789        3 65.00000   12.60   23.80
 8: 759        3 66.00000   10.55   26.80
 9: 762        3 66.66667   10.55   27.20
10: 741        3 69.00000    8.50   23.30

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM