简体   繁体   English

使用 apply 或 map 将列名称传递到 function

[英]Pass column names into a function using apply or map

I want to apply multiple functions to the same dataframe. However, I have not been able to successfully pass column names as a parameter in purrr::imap .我想对同一个 dataframe 应用多个函数。但是,我无法成功地将列名作为参数传递给purrr::imap I keep get the following error:我不断收到以下错误:

Error in UseMethod("select"): no applicable method for 'select' applied to an object of class "character" UseMethod(“select”)中的错误:“select”没有适用的方法应用于 class“字符”的 object

I have tried many combinations for evaluation (eg, using !!! , [[ , enquo , sys.lang , and on and on).我已经尝试了很多组合来进行评估(例如,使用!!![[enquosys.lang )。 when I apply a function (eg, check_1 ) directly to a dataframe, select works fine.当我将 function(例如check_1 )直接应用于 dataframe 时, select工作正常。 However, it does not work when I try to pass column names as a parameter using imap and exec .The format of the column name is part of the issue (eg, 1.1. ), but I have tried quotes and single quotes, etc.但是,当我尝试使用imapexec将列名作为参数传递时,它不起作用。列名的格式是问题的一部分(例如1.1. ),但我尝试了引号和单引号等。

This is a follow up to a previous post , but that post and solution focused on applying multiple functions to individual columns.这是上一篇文章的后续,但该文章和解决方案侧重于将多个函数应用于各个列。 Now, I need to apply multiple functions, which use more than one column in the dataframe;现在,我需要应用多个函数,其中使用了 dataframe 中的多个列; hence, the need to specify column names in a function.因此,需要在 function 中指定列名。

Minimal Example最小的例子

Data数据

df <- structure(
  list(
    `1.1.` = c("Andrew", "Max", "Sylvia", NA, "1",
               NA, NA, "Jason"),
    `1.2.` = c(1, 2, 2, NA, 4, 5, 3, NA),
    `1.2.1.` = c(
      "cool", "amazing", "wonderful", "okay",
      NA, NA, "chocolate", "fine"
    )
  ),
  class = "data.frame",
  row.names = c(NA, -8L)
)

What I have Tried我试过的

library(purrr)
library(dplyr)

check_1 <- function(x, col1, col2) {
  x %>%
    dplyr::select(col1, col2) %>%
    dplyr::mutate(row.index = row_number()) %>%
    dplyr::filter(col1 == "Jason" & is.na(col2) == TRUE) %>%
    dplyr::select(row.index) %>%
    unlist() %>%
    as.vector()
}

check_2 <- function(x, col1, col2) {
  index <- x %>%
    dplyr::select(col1, col2) %>%
    dplyr::mutate(row.index = row_number()) %>%
    dplyr::filter(col1 >= 3 & col1 <= 5 & is.na(col2) == TRUE) %>%
    dplyr::select(row.index) %>%
    unlist() %>%
    as.vector()
  return(index)
}

checks <-
  list("df" = list(fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
       "df" = list(fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))

results <-
  purrr::imap(checks, ~ exec(.x$fn, x = .y,!!!.x$pars))

Expected Output预计 Output

> results
$df
[1] 8

$df
[1] 5 6

Besides the "class character" error, I also get an additional error when I try to test the check_2 function on its own, where it returns no expected values.除了“类字符”错误之外,当我尝试单独测试check_2 function 时,我还遇到了一个额外的错误,它没有返回预期值。

[1] 1.2.      1.2.1.    row.index
<0 rows> (or 0-length row.names)

I have looked at many other similar SO posts (eg, this one ), but none have solved this issue for me.我看过许多其他类似的 SO 帖子(例如, 这个),但没有一个能为我解决这个问题。

The first issue is that you pass the name of the dataframe but not the the dataframe itself.第一个问题是您传递了 dataframe 的名称,而不是 dataframe 本身。 That's why you get the first error as you are trying to select from a character string.这就是为什么在尝试从字符串中select时出现第一个错误的原因。 To solve this issue add the dataframe to the list you are looping over.要解决此问题,请将 dataframe 添加到您循环访问的列表中。

The second issue is that when you pass the column names as character string you have to tell dplyr that these characters refer to columns in your data.第二个问题是,当您将列名作为字符串传递时,您必须告诉dplyr这些字符引用数据中的列。 This could be achieved by eg making use of the .data pronoun.这可以通过例如使用.data代词来实现。

Finally, instead of select + unlist + as.vector you could simply use dplyr::pull :最后,您可以简单地使用dplyr::pull代替select + unlist + as.vector

library(purrr)
library(dplyr)

check_1 <- function(x, col1, col2) {
  x %>%
    dplyr::select(all_of(c(col1, col2))) %>%
    dplyr::mutate(row.index = row_number()) %>%
    dplyr::filter(.data[[col1]] == "Jason" & is.na(.data[[col2]]) == TRUE) %>%
    dplyr::pull(row.index)
}

check_2 <- function(x, col1, col2) {
  x %>%
    dplyr::select(all_of(c(col1, col2))) %>% 
    dplyr::mutate(row.index = row_number()) %>%
    dplyr::filter(.data[[col1]] >= 3 & .data[[col1]] <= 5 & is.na(.data[[col2]]) == TRUE) %>%
    dplyr::pull(row.index)
}

checks <-
  list(df = list(df = df, fn = check_1, pars = list(col1 = "1.1.", col2 = "1.2.")),
       df = list(df = df, fn = check_2, pars = list(col1 = "1.2.", col2 = "1.2.1.")))

purrr::map(checks, ~ exec(.x$fn, x = .x$df, !!!.x$pars))
#> $df
#> [1] 8
#> 
#> $df
#> [1] 5 6

Use select({{col1}},{{col2}}) this most probably help you使用 select({{col1}},{{col2}}) 这最有可能帮助你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM