简体   繁体   English

tidyr separate_rows 与用户定义 function? (r / tidyverse)

[英]tidyr separate_rows with user defined function? (r / tidyverse)

separate_rows separate based on column values into multiple rows, repeating value of other columns. separate_rows 根据列值分成多行,重复其他列的值。

> t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
> t %>% separate_rows(x, sep = ",")
# A tibble: 4 × 2
  x         v
  <chr> <dbl>
1 a         1
2 b         1
3 c         2
4 d         2

However, what if I want to apply a function over it?但是,如果我想在上面申请一个 function 怎么办? after the separate for example change the value of x to true if in ("a", "b") and false otherwise.例如,在分隔之后,如果在 ("a", "b") 中,则将 x 的值更改为 true,否则为 false。

I understand all I need to do is a mutate follow separate_rows.我知道我需要做的就是跟随 separate_rows 进行变异。 My question is if there is already a function that does separate and process a comma delimited value.我的问题是,是否已经有一个 function 可以分隔并处理以逗号分隔的值。 How do I use the function in a similar way as separate_rows?如何以与 separate_rows 类似的方式使用 function? (the reason is I want to separate complex split logic into a function rather than in mutate) (原因是我想将复杂的拆分逻辑分成 function 而不是 mutate)

For example below does the logic above and return a vector of values.例如,下面执行上面的逻辑并返回一个值向量。 Is it possible perform similar operation as separate rows?是否可以作为单独的行执行类似的操作? (ie. split on the column and repeating row values) (即拆分列和重复行值)

proc <- function(text){
  text %>% 
    str_split(pattern = ",") %>%
    unlist() %>%
    sapply(function(x){
            if(x %in% c("a", "b")) 
              return(T) 
            else 
              return(F)
          })
}

Kind of的种类

If you keep the output of your function (here proc ) in list form instead of unlist ing, you can apply that function to x with mutate and then unnest x .如果您将 function(此处为proc )的 output 保留为 list 形式而不是unlist ing,则可以将 function 应用于x并进行mutate然后unnest x Keeping it in list form preserves the info about which element of proc(t$x) corresponds to which row of t , and that info is lost when you unlist .以列表形式保留它会保留有关proc(t$x)的哪个元素对应于t的哪一行的信息,并且当您unlist时该信息会丢失。

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(text) {
  text %>%
    str_split(pattern = ",") %>%
    lapply(function(x) {
      x %in% c("a", "b")
    })
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  mutate(x = proc(x)) %>% 
  unnest(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

Created on 2022-02-20 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-02-20

But, if you're going to use two functions anyway ( mutate and unnest ), you may as well just use separate_rows and then mutate .但是,如果您无论如何都要使用两个函数( mutateunnest ),您也可以只使用separate_rows然后再使用mutate

Or, you could pack everything into the proc function.或者,您可以将所有内容打包到proc function 中。

library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)

proc <- function(df, col) {
  fun <- function(text) {
    text %>%
      str_split(pattern = ",") %>%
      lapply(function(x) {
        x %in% c("a", "b")
      })
  }
  df %>% 
    mutate(across({{ col }}, fun)) %>% 
    unnest({{ col }})
}

t <- tibble(x = c("a,b", "c,d"), v = c(1,2))

t %>% 
  proc(x)
#> # A tibble: 4 × 2
#>   x         v
#>   <lgl> <dbl>
#> 1 TRUE      1
#> 2 TRUE      1
#> 3 FALSE     2
#> 4 FALSE     2

Created on 2022-02-20 by the reprex package (v2.0.1)reprex package (v2.0.1) 创建于 2022-02-20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM