[英]replace NAs in a column of a data.table with means of the same column grouped by a factor
I have the following sample data table 我有以下示例数据表
steps.dt = data.table(steps=rep(0:2, each=3),
date=as.factor(rep(c("10/2/2012", "10/3/2012", "10/4/2012"), each = 3)), interval = as.factor(rep(c(0,5,10), each = 3)))
inserting a few NAs 插入一些NA
steps.dt[c(2,5,8),"steps"]=NA
the table now looks like this 桌子现在看起来像这样
steps date interval
1: 0 10/2/2012 0
2: NA 10/2/2012 0
3: 0 10/2/2012 0
4: 1 10/3/2012 5
5: NA 10/3/2012 5
6: 1 10/3/2012 5
7: 2 10/4/2012 10
8: NA 10/4/2012 10
9: 2 10/4/2012 10
Now, I am trying to replace the NAs in the column "steps" with the means of steps grouped by the factor "interval" 现在,我尝试将“步骤”列中的NA替换为按因子“间隔”分组的步骤
I have looked at some of the posts on SO like this but that I need the replacement to be grouped by a factor is complicating it. 我看过一些关于这样的帖子像这样 ,但我需要更换由一个因素是复杂它进行分组。 Is there a way to do this without using a loop?
有没有一种方法可以不使用循环? thank you!
谢谢!
We can use na.aggregate
from zoo
to replace the 'NA' with the mean
of the 'steps' after grouping by 'interval' 我们可以使用
na.aggregate
从zoo
与替换“NA” mean
通过“间隔”分组后的“阶梯”
library(zoo)
steps.dt[, steps := na.aggregate(steps), interval]
Solution using dplyr 使用dplyr的解决方案
library(dplyr)
steps.dt = steps.dt %>% group_by(interval) %>%
mutate(steps = ifelse(is.na(steps),mean(steps,na.rm = T),steps))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.