簡體   English   中英

如何在 r 中按組填充 NA 的平均值?

[英]How to fill mean for NAs in column by groups in r?

我有一個包含多個 NA 的數據集,我想為每列取平均值並按特定組填充 Nas 我的數據集如下所示

PID Category    column1 column2 column3
123    1             54    2.4  NA
324    1             52    NA   21.1
356    1             NA    3.6  25.6
378    2             56    3.2  NA
395    2             NA    3.5  29.9
362    2             45    NA   24.3
789    3             65   12.6  23.8
759    3             66    NA   26.8
762    3             NA    NA   27.2
741    3             69   8.5   23.3

我需要想要的 output

PID Category    column1 column2 column3
123    1             54   2.4   23.3
324    1             52   3.0   21.1
356    1             53   3.6   25.6
378    2             56   3.2   27.1
395    2             50.5 3.5   29.9
362    2             61.3 3.3   24.3
789    3             65   12.6  23.8
759    3             66   10.5  26.8
762    3             66.6 10.5  27.2
741    3             69   8.5   23.3

謝謝

您可以使用:

library(dplyr)

df %>%
  group_by(Category) %>%
  mutate(across(starts_with('column'), 
                ~replace(., is.na(.), mean(., na.rm = TRUE)))) %>%
  ungroup

#     PID Category column1 column2 column3
#   <int>    <int>   <dbl>   <dbl>   <dbl>
# 1   123        1    54      2.4     23.4
# 2   324        1    52      3       21.1
# 3   356        1    53      3.6     25.6
# 4   378        2    56      3.2     27.1
# 5   395        2    50.5    3.5     29.9
# 6   362        2    45      3.35    24.3
# 7   789        3    65     12.6     23.8
# 8   759        3    66     10.6     26.8
# 9   762        3    66.7   10.6     27.2
#10   741        3    69      8.5     23.3

我們可以使用zoo中的na.aggregate ,默認情況下,它將NA替換為相關列的mean

library(dplyr)
library(zoo)
df1 %>%
   group_by(Category) %>%
   mutate(across(starts_with('column'), na.aggregate)) %>%
   ungroup

或者使用group_modifyna.aggregate作為@G。 格洛騰迪克在評論中建議

df1 %>% 
  group_by(Category) %>% 
  group_modify(na.aggregate) %>%
  ungroup

或使用data.table

library(data.table)
nm1 <- grep("^column\\d+$", names(df1), value = TRUE)
setDT(df1)[, (nm1) := na.aggregate(.SD), by = Category, .SDcols = nm1]

或與base R

unsplit(lapply(split(df1, df1$Category), na.aggregate), df1$Category)

另一個data.table選項

cbind(
  setDT(df)[, "PID"],
  df[,
    lapply(
      .SD,
      function(x) replace(x, is.na(x), mean(x, na.rm = TRUE))
    ), Category,
    .SDcols = patterns("^column")
  ]
)

   PID Category  column1 column2 column3
 1: 123        1 54.00000    2.40   23.35
 2: 324        1 52.00000    3.00   21.10
 3: 356        1 53.00000    3.60   25.60
 4: 378        2 56.00000    3.20   27.10
 5: 395        2 50.50000    3.50   29.90
 6: 362        2 45.00000    3.35   24.30
 7: 789        3 65.00000   12.60   23.80
 8: 759        3 66.00000   10.55   26.80
 9: 762        3 66.66667   10.55   27.20
10: 741        3 69.00000    8.50   23.30

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM