简体   繁体   中英

Paste element of a vector into dplyr function

I have the following dataset:

df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
             a = c(7, 3, 5),
             b = c(5, 8, 1),
             c = c(8, 4, 3))

and this vector:

v <- c("a", "b", "c")

Now I want to create a new dataset and summarise a, b, and c by creating new variables ( y_a , y_b , and y_c ) that calculate the mean of each variable grouped by year.

The code for doing this is the following:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(y_a = mean(a, na.rm = TRUE),
                y_b = mean(b, na.rm = TRUE),
                y_c = mean(c, na.rm = TRUE))

However, I want to use the vector v to read the respective variable from it and paste in into the summarise function:

y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[2])) = mean(as.name(v[1]), na.rm = TRUE),
                                                   as.name(paste0("y_", v[3])) = mean(as.name(v[1]), na.rm = TRUE))

Doing so, I receive the following error message:

Error: unexpected '=' in "y <- df_x %>% group_by(year) %>%  dplyr::summarise(as.name(paste0("y_", v[1])) ="

How can I paste the value of a vector in this summarise function so that it works?

To define a new variable on the left hand side, you need := instead of = . Because you create it with paste0 , you need !! to inject the expression and make sure that is correctly evaluated. To access existing columns in dplyr with a string stored in a variable, using .data is the easiest way.

library(dplyr)

df_x <- data.frame(year = c(2000, 2000, 2000, 2001, 2001, 2001, 2002, 2002, 2002),
                   a = c(7, 3, 5),
                   b = c(5, 8, 1),
                   c = c(8, 4, 3))

v <- c("a", "b", "c")

df_x %>% group_by(year) %>% 
  dplyr::summarise(!!paste0("y_", v[1]) := mean(.data[[v[1]]], na.rm = TRUE),
                   !!paste0("y_", v[2]) := mean(.data[[v[1]]], na.rm = TRUE),
                   !!paste0("y_", v[3]) := mean(.data[[v[1]]], na.rm = TRUE))
#> # A tibble: 3 × 4
#>    year   y_a   y_b   y_c
#>   <dbl> <dbl> <dbl> <dbl>
#> 1  2000     5     5     5
#> 2  2001     5     5     5
#> 3  2002     5     5     5

Created on 2022-12-21 by the reprex package (v1.0.0)

Here is a one-liner via base R,

aggregate(. ~ year, cbind.data.frame(year = df_x$year, df_x[v]), FUN = \(i)mean(i, na.rm = TRUE))

  year a        b c
1 2000 5 4.666667 5
2 2001 5 4.666667 5
3 2002 5 4.666667 5

It would be easier with across and modifying the names with .names

library(dplyr)
df_x %>% 
 group_by(year) %>% 
 summarise(across(all_of(v), ~ mean(.x, na.rm = TRUE), .names = "y_{.col}"))

-output

# A tibble: 3 × 4
   year   y_a   y_b   y_c
  <dbl> <dbl> <dbl> <dbl>
1  2000     5  4.67     5
2  2001     5  4.67     5
3  2002     5  4.67     5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM