簡體   English   中英

如何將 purrr modify_if 與具有不同參數的多個函數一起使用?

[英]How to use purrr modify_if with several functions with different arguments?

問題解決了!

題:

在 R 中,我一直試圖找到一種優雅的方法來將具有不同參數的多個函數應用於包含許多 tibbles/data.frames 的列表,但是,我正在努力正確傳遞參數。 我正在嘗試清理和預處理葯品中的文本數據,並且我一直在嘗試使用 modify_if、invoke、map 等。 任何幫助是極大的贊賞。

注:才開始學編程,幼稚請見諒:)

# Set up Example Data 
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
             ,"Character_Variable" = c("tester to upper"
                          ,"test   squishing"
                          ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                          ,"         test white space triming      " ))

# With modify_if with a singular function and arguments it works: 
# Mofidy character vectors by trimming the left side of the string --= works well
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = str_trim
      , side = "left") # Works well
# Expected results
# A tibble: 4 x 2
#   Integer_Variable Character_Variable                                          
#              <int> <chr>                                                       
# 1                1 "tester to upper"                                           
# 2                2 "test   squishing"                                          
# 3                3 "canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
# 4                4 "test white space triming      "   
####### Note the right hanging whitespace proving the arguments is being applied!

但是,當我嘗試使用多個帶有任何參數的函數執行此操作時,我碰壁了(函數參數被忽略)。 我已經嘗試了很多 modify_if(下面有一些)和其他函數的組合,例如 invoke(但它已退休),exec with map(這對我來說沒有意義)。 到目前為止沒有成功。 任何幫助表示感謝。

# does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character                # = the condition to specify which column to apply the functions to  
      ,.f = c(                      # a pairwise list of "name" = "function to apply" to apply to each column where the condition = TRUE
        UpperCase = str_to_upper        # Convert strings to upper case
        ,TrimLeadTailWhiteSpace = str_trim  # trim leading and ending whitespace
        ,ExcessWhiteSpaceRemover = str_squish)  # if you find any double or more whitespaces (eg "  " or "   ") then cut it down to " " 
      , side = "left"              # its ignoring these arguments.
    )

# Does not work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = c(UpperCase = list(str_to_upper)    # listed variant doesnt work
        ,TrimLeadTailWhiteSpace = list(str_trim, side = "left")
        ,ExcessWhiteSpaceRemover = list(str_squish))
    ) # returns the integer variable instead of the character so drastically wrong

# Set up Function - Argument Table
Function_ArgumentList <- tibble("upper" = list(str_to_upper)
                   ,"trim" = list(str_trim, side = "left")
                   ,"squish" = list(str_squish))

# Doesnt work
modify_if(.x = Test_DataFrame
      ,.p = is.character
      ,.f = Function_ArgumentList)
# Error: Can't convert a `tbl_df/tbl/data.frame` object to function
# Run `rlang::last_error()` to see where the error occurred.

我意識到上面示例中使用的函數可以在沒有參數的情況下通過,但是為了解決我遇到的問題,這是我遇到的問題的簡單示例。

解決方案:

感謝@stefan 和@BenNorris 的幫助;p 下面! 為了更清楚地@stefan 的解決方案,我稍微修改了答案;

library(dplyr)
library(purrr)
library(stringr)
Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                        ,"Character_Variable" = c("tester to upper"
                                                ,"test   squishing"
                                                ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                ,"         test white space triming      " )
                        )
f_help <- function(x, side = "left") {
                str_to_upper(x) %>% 
                str_trim(side = side) # %>% 
                # str_squish()                # note that this is commented out
                }

modify_if(.x = Test_DataFrame
        ,.p = is.character
        ,.f = f_help
        ,side = "left") 
# A tibble: 4 x 2
# Integer_Variable Character_Variable                                          
# <int> <chr>                                                       
# 1     "TESTER TO UPPER"                                           
# 2     "TEST   SQUISHING"                                          
# 3     "CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?"
# 4     "TEST WHITE SPACE TRIMING      " 
                              # Note the right sided white space is still preent! It worked!!!

據我所知,有兩種方法可以解決這個問題

  1. 使用輔助函數
  2. 使用purrr::compose
library(dplyr)
library(purrr)
library(stringr)

Test_DataFrame <- tibble("Integer_Variable" = c(rep(x = 1:4))
                         ,"Character_Variable" = c("tester to upper"
                                                   ,"test   squishing"
                                                   ,"canitcomprehend?.,-0(`kljndsfiuhaweraeriou140987645=Error?"
                                                   ,"         test white space triming      " ))

f_help <- function(x, side = "left") {
  str_to_upper(x) %>% 
    str_trim(side = side) %>% 
    str_squish()
}

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = f_help, side = "left"
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING

modify_if(.x = Test_DataFrame,
          .p = is.character,
          .f = purrr::compose(str_to_upper, ~ str_trim(.x, side = "left"), str_squish)
)
#> # A tibble: 4 x 2
#>   Integer_Variable Character_Variable                                        
#>              <int> <chr>                                                     
#> 1                1 TESTER TO UPPER                                           
#> 2                2 TEST SQUISHING                                            
#> 3                3 CANITCOMPREHEND?.,-0(`KLJNDSFIUHAWERAERIOU140987645=ERROR?
#> 4                4 TEST WHITE SPACE TRIMING

modify_if().f參數期望(根據其幫助文件):

A function, formula, or vector (not necessarily atomic).
If a function, it is used as is.

If a formula, e.g. ~ .x + 2, it is converted to a function. 
There are three ways to refer to the arguments:

    For a single argument function, use .
    For a two argument function, use .x and .y
    For more arguments, use ..1, ..2, ..3 etc

This syntax allows you to create very compact anonymous functions.

If character vector, numeric vector, or list, it is converted to an extractor function. 
Character vectors index by name and numeric vectors index by position; use a list to index 
by position and name at different levels. If a component is not present, the value of 
.default will be returned.

因此,如果您提供向量或列表, modify_if會嘗試將您的值強制轉換為索引(並且失敗)。 你有兩個選擇。 首先,您可以創建自己的自定義函數來執行您想要的操作:

custom_function < function(x) {
  str_squish(str_trim(str_to_upper(x), side = "left"))
}
modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = custom_function
          )

或者您可以將該函數編寫為匿名函數。

modify_if(.x = Test_DataFrame, 
          .p = is.character,              
          .f = function(x) {
                           str_squish(str_trim(str_to_upper(x), side = "left"))
                           }
          )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM