[英]using R dplyr replace NA with group mean but omitting some values from group before mean calculation
This seems like it should be a simple one but I can't see it.这似乎应该是一个简单的,但我看不到它。
Say I have a dataframe like:假设我有一个 dataframe 像:
df <- data.frame(type=c(rep("A", 5), rep("B",5)),
stage=rep(c("1","2", "3", "4", "5"),2),
val=c(rnorm(n=5, mean=1000, sd=300),rnorm(n=4, mean=1000, sd=100), NA)
)
I want to be able to replace the NA in group type=="B"
by the mean of group B, but omit some "vals" from B (eg where "stage" equals 1 OR stage equals 2 - or any other condition).我希望能够通过 B 组的平均值替换组
type=="B"
中的 NA,但从 B 中省略一些“vals”(例如,“stage”等于 1 或 stage 等于 2 - 或任何其他条件) . Using dplyr
and zoo
its easy to fill in with the group mean:使用
dplyr
和zoo
很容易用组平均值填充:
df %>% dplyr::group_by(type) %>% dplyr::mutate_at("val", zoo::na.aggregate)
but I can't work out how to exclude vals from the group based on condition in "stage".但我不知道如何根据“阶段”中的条件从组中排除 val。 Ideally, a dplyr solution but including zoo would also be good.
理想情况下,dplyr 解决方案但包括动物园也很好。
This is how you could do it with the condition stage != 2
:这就是你可以如何使用条件
stage != 2
:
library(tidyverse)
set.seed(12345)
df <- data.frame(type=c(rep("A", 5), rep("B",5)),
stage=rep(c("1","2", "3", "4", "5"),2),
val=c(rnorm(n=5, mean=1000, sd=300),rnorm(n=4, mean=1000, sd=100), NA)
)
df %>%
group_by(type) %>%
mutate(val = replace_na(val, mean(val[stage != 2], na.rm = TRUE)))
#> # A tibble: 10 x 3
#> # Groups: type [2]
#> type stage val
#> <fct> <fct> <dbl>
#> 1 A 1 1176.
#> 2 A 2 1213.
#> 3 A 3 967.
#> 4 A 4 864.
#> 5 A 5 1182.
#> 6 B 1 818.
#> 7 B 2 1063.
#> 8 B 3 972.
#> 9 B 4 972.
#> 10 B 5 921.
Created on 2020-05-08 by the reprex package (v0.3.0)由代表 package (v0.3.0) 于 2020 年 5 月 8 日创建
I set the seed to a number so everybody gets the same numbers.我将种子设置为一个数字,这样每个人都会得到相同的数字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.