简体   繁体   中英

using R dplyr replace NA with group mean but omitting some values from group before mean calculation

This seems like it should be a simple one but I can't see it.

Say I have a dataframe like:

df <- data.frame(type=c(rep("A", 5), rep("B",5)),
                 stage=rep(c("1","2", "3", "4", "5"),2),
                 val=c(rnorm(n=5, mean=1000, sd=300),rnorm(n=4, mean=1000, sd=100), NA)
                 )

I want to be able to replace the NA in group type=="B" by the mean of group B, but omit some "vals" from B (eg where "stage" equals 1 OR stage equals 2 - or any other condition). Using dplyr and zoo its easy to fill in with the group mean:

df %>% dplyr::group_by(type) %>% dplyr::mutate_at("val", zoo::na.aggregate) 

but I can't work out how to exclude vals from the group based on condition in "stage". Ideally, a dplyr solution but including zoo would also be good.

This is how you could do it with the condition stage != 2 :

library(tidyverse)

set.seed(12345)
df <- data.frame(type=c(rep("A", 5), rep("B",5)),
                 stage=rep(c("1","2", "3", "4", "5"),2),
                 val=c(rnorm(n=5, mean=1000, sd=300),rnorm(n=4, mean=1000, sd=100), NA)
)



df %>% 
  group_by(type) %>% 
  mutate(val = replace_na(val, mean(val[stage != 2], na.rm = TRUE)))
#> # A tibble: 10 x 3
#> # Groups:   type [2]
#>    type  stage   val
#>    <fct> <fct> <dbl>
#>  1 A     1     1176.
#>  2 A     2     1213.
#>  3 A     3      967.
#>  4 A     4      864.
#>  5 A     5     1182.
#>  6 B     1      818.
#>  7 B     2     1063.
#>  8 B     3      972.
#>  9 B     4      972.
#> 10 B     5      921.

Created on 2020-05-08 by the reprex package (v0.3.0)

I set the seed to a number so everybody gets the same numbers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM