繁体   English   中英

如何将 purrr modify_if 与具有不同参数的多个函数一起使用?

[英]How to use purrr modify_if with several functions with different arguments?

问题解决了!

题:

在 R 中,我一直试图找到一种优雅的方法来将具有不同参数的多个函数应用于包含许多 tibbles/data.frames 的列表,但是,我正在努力正确传递参数。 我正在尝试清理和预处理药品中的文本数据,并且我一直在尝试使用 modify_if、invoke、map 等。 任何帮助是极大的赞赏。

注:才开始学编程,幼稚请见谅:)

# Set up Example Data 
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
             ,"Character_Variable" = c("tester to upper"
                          ,"test   squishing"
                          ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                          ,"         test white space triming      " ))

# With modify_if with a singular function and arguments it works: 
# Mofidy character vectors by trimming the left side of the string --= works well
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = str_trim
      , side = "left") # Works well
# Expected results
# A tibble: 4 x 2
#   Integer_Variable Character_Variable                                          
#              <int> <chr>                                                       
# 1                1 "tester to upper"                                           
# 2                2 "test   squishing"                                          
# 3                3 "canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
# 4                4 "test white space triming      "   
####### Note the right hanging whitespace proving the arguments is being applied!

但是,当我尝试使用多个带有任何参数的函数执行此操作时,我碰壁了(函数参数被忽略)。 我已经尝试了很多 modify_if(下面有一些)和其他函数的组合,例如 invoke(但它已退休),exec with map(这对我来说没有意义)。 到目前为止没有成功。 任何帮助表示感谢。

# does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character                # = the condition to specify which column to apply the functions to  
      ,.f = c(                      # a pairwise list of "name" = "function to apply" to apply to each column where the condition = TRUE
        UpperCase = str_to_upper        # Convert strings to upper case
        ,TrimLeadTailWhiteSpace = str_trim  # trim leading and ending whitespace
        ,ExcessWhiteSpaceRemover = str_squish)  # if you find any double or more whitespaces (eg "  " or "   ") then cut it down to " " 
      , side = "left"              # its ignoring these arguments.
    )

# Does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = c(UpperCase = list(str_to_upper)    # listed variant doesnt work
        ,TrimLeadTailWhiteSpace = list(str_trim, side = "left")
        ,ExcessWhiteSpaceRemover = list(str_squish))
    ) # returns the integer variable instead of the character so drastically wrong

# Set up Function - Argument Table
Function_ArgumentList <- tibble("upper" = list(str_to_upper)
                   ,"trim" = list(str_trim, side = "left")
                   ,"squish" = list(str_squish))

# Doesnt work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = Function_ArgumentList)
# Error: Can't convert a `tbl_df/tbl/data.frame` object to function
# Run `rlang::last_error()` to see where the error occurred.

我意识到上面示例中使用的函数可以在没有参数的情况下通过,但是为了解决我遇到的问题,这是我遇到的问题的简单示例。

解决方案:

感谢@stefan 和@BenNorris 的帮助;p 下面! 为了更清楚地@stefan 的解决方案,我稍微修改了答案;

library(dplyr)
library(purrr)
library(stringr)
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                        ,"Character_Variable" = c("tester to upper"
                                                ,"test   squishing"
                                                ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                ,"         test white space triming      " )
                        )
f_help <- function(x, side = "left") {
                str_to_upper(x) %>% 
                str_trim(side = side) # %>% 
                # str_squish()                # note that this is commented out
                }

modify_if(.x = Test_DataFrame
        ,.p = is.character
        ,.f = f_help
        ,side = "left") 
# A tibble: 4 x 2
# Integer_Variable Character_Variable                                          
# <int> <chr>                                                       
# 1     "TESTER TO UPPER"                                           
# 2     "TEST   SQUISHING"                                          
# 3     "CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?"
# 4     "TEST WHITE SPACE TRIMING      " 
                              # Note the right sided white space is still preent! It worked!!!

据我所知,有两种方法可以解决这个问题

  1. 使用辅助函数
  2. 使用purrr::compose
library(dplyr)
library(purrr)
library(stringr)

Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                         ,"Character_Variable" = c("tester to upper"
                                                   ,"test   squishing"
                                                   ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                   ,"         test white space triming      " ))

f_help <- function(x, side = "left") {
  str_to_upper(x) %>% 
    str_trim(side = side) %>% 
    str_squish()
}

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = f_help, side = "left"
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = purrr::compose(str_to_upper, ~ str_trim(.x, side = "left"), str_squish)
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING

modify_if().f参数期望(根据其帮助文件):

A function, formula, or vector (not necessarily atomic).
If a function, it is used as is.

If a formula, e.g. ~ .x + 2, it is converted to a function. 
There are three ways to refer to the arguments:

    For a single argument function, use .
    For a two argument function, use .x and .y
    For more arguments, use ..1, ..2, ..3 etc

This syntax allows you to create very compact anonymous functions.

If character vector, numeric vector, or list, it is converted to an extractor function. 
Character vectors index by name and numeric vectors index by position; use a list to index 
by position and name at different levels. If a component is not present, the value of 
.default will be returned.

因此,如果您提供向量或列表, modify_if会尝试将您的值强制转换为索引(并且失败)。 你有两个选择。 首先,您可以创建自己的自定义函数来执行您想要的操作:

custom_function < function(x) {
  str_squish(str_trim(str_to_upper(x), side = "left"))
}
modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = custom_function
          )

或者您可以将该函数编写为匿名函数。

modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = function(x) {
                           str_squish(str_trim(str_to_upper(x), side = "left"))
                           }
          )

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM