tidyr separate_rows 与用户定义 function？（r / tidyverse）

Question

separate_rows separate based on column values into multiple rows, repeating value of other columns. separate_rows 根据列值分成多行，重复其他列的值。

> t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
> t %>% separate_rows(x, sep = ",")
# A tibble: 4 × 2
  x         v
  <chr> <dbl>
1 a         1
2 b         1
3 c         2
4 d         2

However, what if I want to apply a function over it?但是，如果我想在上面申请一个 function 怎么办？ after the separate for example change the value of x to true if in ("a", "b") and false otherwise.例如，在分隔之后，如果在 ("a", "b") 中，则将 x 的值更改为 true，否则为 false。

I understand all I need to do is a mutate follow separate_rows.我知道我需要做的就是跟随 separate_rows 进行变异。 My question is if there is already a function that does separate and process a comma delimited value.我的问题是，是否已经有一个 function 可以分隔并处理以逗号分隔的值。 How do I use the function in a similar way as separate_rows?如何以与 separate_rows 类似的方式使用 function？ (the reason is I want to separate complex split logic into a function rather than in mutate) （原因是我想将复杂的拆分逻辑分成 function 而不是 mutate）

For example below does the logic above and return a vector of values.例如，下面执行上面的逻辑并返回一个值向量。 Is it possible perform similar operation as separate rows?是否可以作为单独的行执行类似的操作？ (ie. split on the column and repeating row values) （即拆分列和重复行值）

proc <- function(text){
  text %>% 
    str_split(pattern = ",") %>%
    unlist() %>%
    sapply(function(x){
            if(x %in% c("a", "b")) 
              return(T) 
            else 
              return(F)
          })
}

Answer 1

Kind of的种类

If you keep the output of your function (here proc ) in list form instead of unlist ing, you can apply that function to x with mutate and then unnest x .如果您将 function（此处为proc ）的 output 保留为 list 形式而不是unlist ing，则可以将 function 应用于x并进行mutate然后unnest x 。 Keeping it in list form preserves the info about which element of proc(t$x) corresponds to which row of t , and that info is lost when you unlist .以列表形式保留它会保留有关proc(t$x)的哪个元素对应于t的哪一行的信息，并且当您unlist时该信息会丢失。

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(text) {
  text %>%
    str_split(pattern = ",") %>%
    lapply(function(x) {
      x %in% c("a", "b")
    })
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  mutate(x = proc(x)) %>% 
  unnest(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

^{Created on 2022-02-20 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-02-20}

But, if you're going to use two functions anyway ( mutate and unnest ), you may as well just use separate_rows and then mutate .但是，如果您无论如何都要使用两个函数（ mutate和unnest ），您也可以只使用separate_rows然后再使用mutate 。

Or, you could pack everything into the proc function.或者，您可以将所有内容打包到proc function 中。

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(df, col) {
  fun <- function(text) {
    text %>%
      str_split(pattern = ",") %>%
      lapply(function(x) {
        x %in% c("a", "b")
      })
  }
  df %>% 
    mutate(across({{ col }}, fun)) %>% 
    unnest({{ col }})
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  proc(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

^{Created on 2022-02-20 by the reprex package (v2.0.1)}^{由reprex package (v2.0.1) 创建于 2022-02-20}

tidyr separate_rows 与用户定义 function？（r / tidyverse）

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-20 06:04:10

tidyr separate_rows 与用户定义 function？ （r / tidyverse）

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-20 06:04:10

tidyr separate_rows 与用户定义 function？（r / tidyverse）

解决方案1
1 已采纳 2022-02-20 06:04:10