简体   繁体   English

用同一列按因子分组的方式替换data.table列中的NA

[英]replace NAs in a column of a data.table with means of the same column grouped by a factor

I have the following sample data table 我有以下示例数据表

steps.dt = data.table(steps=rep(0:2, each=3), 
date=as.factor(rep(c("10/2/2012", "10/3/2012", "10/4/2012"), each = 3)), interval = as.factor(rep(c(0,5,10), each = 3)))

inserting a few NAs 插入一些NA

steps.dt[c(2,5,8),"steps"]=NA

the table now looks like this 桌子现在看起来像这样

   steps      date interval
1:     0 10/2/2012        0
2:    NA 10/2/2012        0
3:     0 10/2/2012        0
4:     1 10/3/2012        5
5:    NA 10/3/2012        5
6:     1 10/3/2012        5
7:     2 10/4/2012       10
8:    NA 10/4/2012       10
9:     2 10/4/2012       10

Now, I am trying to replace the NAs in the column "steps" with the means of steps grouped by the factor "interval" 现在,我尝试将“步骤”列中的NA替换为按因子“间隔”分组的步骤

I have looked at some of the posts on SO like this but that I need the replacement to be grouped by a factor is complicating it. 我看过一些关于这样的帖子像这样 ,但我需要更换由一个因素是复杂它进行分组。 Is there a way to do this without using a loop? 有没有一种方法可以不使用循环? thank you! 谢谢!

We can use na.aggregate from zoo to replace the 'NA' with the mean of the 'steps' after grouping by 'interval' 我们可以使用na.aggregatezoo与替换“NA” mean通过“间隔”分组后的“阶梯”

library(zoo)
steps.dt[, steps := na.aggregate(steps), interval]

Solution using dplyr 使用dplyr的解决方案

library(dplyr)
steps.dt = steps.dt %>% group_by(interval) %>% 
                        mutate(steps = ifelse(is.na(steps),mean(steps,na.rm = T),steps))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM