I'm trying to create an R function to impute mean values to specific columns in a data frame.
impute_means <- function(df, group_by, column){
vals_to_impute <- df %>%
group_by_at(group_by) %>%
summarise(x = mean(get(column), na.rm = TRUE))
df %>%
filter(is.na(get(column))) %>%
select(group_by, column) %>%
left_join(vals_to_impute, by=group_by)
}
impute_means(df = weather_data, group_by = c("year","month","code","type"), column = "temperature")
The fucntion currently returns this:
However, now I want to check for NA values in the "temperature" column and replace them with values from the x column.
I tried to do that by adding the mutate statement at the end, but it doesn't seem to work
impute_means <- function(df, group_by, column){
vals_to_impute <- df %>%
group_by_at(group_by) %>%
summarise(x = mean(get(column), na.rm = TRUE))
df %>%
filter(is.na(get(column))) %>%
select(group_by, column) %>%
left_join(vals_to_impute, by=group_by) %>%
mutate(column = case_when(is.na(get(column))~x,
TRUE~get(column)))
}
Minimal data to reproduce:
weather_data
structure(list(year = structure(c(8L, 8L, 1L, 1L, 2L, 2L, 3L,
3L, 5L, 6L), .Label = c("2000", "2001", "2002", "2003", "2004",
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012",
"2013", "2014", "2015", "2016", "2017", "2018", "2019"), class = "factor"),
month = structure(c(12L, 12L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9",
"10", "11", "12"), class = "factor"), code = structure(c(1L,
2L, 6L, 1L, 6L, 2L, 2L, 2L, 6L, 2L), .Label = c("1", "2",
"3", "4", "5", "6"), class = "factor"), type = structure(c(2L,
2L, 6L, 2L, 6L, 2L, 2L, 3L, 6L, 3L), .Label = c("1", "2",
"3", "4", "5", "6"), class = "factor"), temperature = c(NA,
NA, 20.8, 19.5, 1.4, 3.1, 27.3, 25.4, 20.2, 26.6)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
You can do -
library(dplyr)
impute_means <- function(df, group_by, column){
df %>%
mutate(val = .data[[column]]) %>%
group_by(across(all_of(group_by))) %>%
mutate(!!column := mean(.data[[column]], na.rm = TRUE)) %>%
filter(is.na(val)) %>%
select(-val) %>%
ungroup
}
impute_means(df = weather_data,
group_by = c("year","month","code","type"),
column = "temperature")
Instead of summarise
ing the data and performing a join I use mutate
to maintain the number of rows in the data.
You can replace .data[[column]]
with get(column)
if you find that easier to understand. Both of them should work the same way.
You could try this function?!
impute_means <- function(df, group_by, column){
df %>%
group_by_at(group_by) %>%
mutate(across(c(column), mean))
}
or if you need a new column:
impute_means <- function(df, group_by, column){
df %>%
group_by_at(group_by) %>%
mutate(x=across(c(column), mean))
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.