简体   繁体   中英

Using column name as function argument in R

I'm trying to create an R function to impute mean values to specific columns in a data frame.

impute_means <- function(df, group_by, column){
  
  vals_to_impute <- df %>%
    group_by_at(group_by) %>%
    summarise(x = mean(get(column), na.rm = TRUE))
  
  df %>%
    filter(is.na(get(column))) %>%
    select(group_by, column) %>%
    left_join(vals_to_impute, by=group_by)
}

impute_means(df = weather_data, group_by = c("year","month","code","type"), column = "temperature")

The fucntion currently returns this: 在此处输入图像描述

However, now I want to check for NA values in the "temperature" column and replace them with values from the x column.

I tried to do that by adding the mutate statement at the end, but it doesn't seem to work

impute_means <- function(df, group_by, column){
  
  vals_to_impute <- df %>%
    group_by_at(group_by) %>%
    summarise(x = mean(get(column), na.rm = TRUE))
  
  df %>%
    filter(is.na(get(column))) %>%
    select(group_by, column) %>%
    left_join(vals_to_impute, by=group_by) %>%
    mutate(column = case_when(is.na(get(column))~x,
                                   TRUE~get(column)))
}

Minimal data to reproduce:

weather_data

structure(list(year = structure(c(8L, 8L, 1L, 1L, 2L, 2L, 3L, 
3L, 5L, 6L), .Label = c("2000", "2001", "2002", "2003", "2004", 
"2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", 
"2013", "2014", "2015", "2016", "2017", "2018", "2019"), class = "factor"), 
    month = structure(c(12L, 12L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "10", "11", "12"), class = "factor"), code = structure(c(1L, 
    2L, 6L, 1L, 6L, 2L, 2L, 2L, 6L, 2L), .Label = c("1", "2", 
    "3", "4", "5", "6"), class = "factor"), type = structure(c(2L, 
    2L, 6L, 2L, 6L, 2L, 2L, 3L, 6L, 3L), .Label = c("1", "2", 
    "3", "4", "5", "6"), class = "factor"), temperature = c(NA, 
    NA, 20.8, 19.5, 1.4, 3.1, 27.3, 25.4, 20.2, 26.6)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

You can do -

library(dplyr)

impute_means <- function(df, group_by, column){
  
  df %>%
    mutate(val = .data[[column]]) %>%
    group_by(across(all_of(group_by))) %>%
    mutate(!!column := mean(.data[[column]], na.rm = TRUE)) %>%
    filter(is.na(val)) %>%
    select(-val) %>% 
    ungroup
}

impute_means(df = weather_data, 
             group_by = c("year","month","code","type"), 
             column = "temperature")

Instead of summarise ing the data and performing a join I use mutate to maintain the number of rows in the data.

You can replace .data[[column]] with get(column) if you find that easier to understand. Both of them should work the same way.

You could try this function?!

impute_means <- function(df, group_by, column){
  
  df %>% 
    group_by_at(group_by) %>% 
    mutate(across(c(column), mean))
}

or if you need a new column:

impute_means <- function(df, group_by, column){
  
  df %>% 
    group_by_at(group_by) %>% 
    mutate(x=across(c(column), mean))
}
  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM