简体   繁体   中英

Replace value in a data.frame within a function (for use with apply)

I have a data.frame that looks like this:

GROUP  |  YEAR  | VAL
A      |  2007  | 10
A      |  2007  | 11
A      |  2007  | NA
A      |  2008  | 13
B      |  2006  | NA
B      |  2006  | 5
B      |  2006  | 6

So each group may have different years. I want to replace those NAs with the mean of the respective group in the respective year. For example, for the NA in row 3, it will be replaced by the mean of group A in year 2007.

I can do this using a for loop, but unfortunately my professor has this hate for "for" loop, so I'm trying to find another way. I tried using a function like this: imputeMean(group,year) , it takes the group and year to calculate the mean, then mutate the data.frame. I then apply this function on a data.frame of group and year to be replaced.

Unfortunately, R does not have pass-by-reference, which means I can't modify the original data.frame directly in the imputeMean() function. Is there anyway to calculate filter a data.frame, calculate the group mean with respect to year, and replace the NA value with this mean, without the use of loop?

We can use na.aggregate after grouping by 'GROUP', 'YEAR'

library(dplyr)
library(zoo)
df1 %>%
    group_by(GROUP, YEAR) %>%
    mutate(VAL = na.aggregate(VAL))

Another dplyr solution:

library(dplyr)

df1 %>%  
  group_by(GROUP, YEAR) %>%
  mutate_at(vars(VAL) , list(~ifelse(is.na(.), mean(., na.rm = TRUE),.)))

#   GROUP  YEAR   VAL
# 1 A      2007  10  
# 2 A      2007  11  
# 3 A      2007  10.5
# 4 A      2008  13  
# 5 B      2006   5.5
# 6 B      2006   5  
# 7 B      2006   6  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM