[英]tidyr separate_rows with user defined function? (r / tidyverse)
separate_rows separate based on column values into multiple rows, repeating value of other columns. separate_rows 根据列值分成多行,重复其他列的值。
> t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
> t %>% separate_rows(x, sep = ",")
# A tibble: 4 × 2
x v
<chr> <dbl>
1 a 1
2 b 1
3 c 2
4 d 2
However, what if I want to apply a function over it?但是,如果我想在上面申请一个 function 怎么办? after the separate for example change the value of x to true if in ("a", "b") and false otherwise.
例如,在分隔之后,如果在 ("a", "b") 中,则将 x 的值更改为 true,否则为 false。
I understand all I need to do is a mutate follow separate_rows.我知道我需要做的就是跟随 separate_rows 进行变异。 My question is if there is already a function that does separate and process a comma delimited value.
我的问题是,是否已经有一个 function 可以分隔并处理以逗号分隔的值。 How do I use the function in a similar way as separate_rows?
如何以与 separate_rows 类似的方式使用 function? (the reason is I want to separate complex split logic into a function rather than in mutate)
(原因是我想将复杂的拆分逻辑分成 function 而不是 mutate)
For example below does the logic above and return a vector of values.例如,下面执行上面的逻辑并返回一个值向量。 Is it possible perform similar operation as separate rows?
是否可以作为单独的行执行类似的操作? (ie. split on the column and repeating row values)
(即拆分列和重复行值)
proc <- function(text){
text %>%
str_split(pattern = ",") %>%
unlist() %>%
sapply(function(x){
if(x %in% c("a", "b"))
return(T)
else
return(F)
})
}
Kind of的种类
If you keep the output of your function (here proc
) in list form instead of unlist
ing, you can apply that function to x
with mutate
and then unnest
x
.如果您将 function(此处为
proc
)的 output 保留为 list 形式而不是unlist
ing,则可以将 function 应用于x
并进行mutate
然后unnest
x
。 Keeping it in list form preserves the info about which element of proc(t$x)
corresponds to which row of t
, and that info is lost when you unlist
.以列表形式保留它会保留有关
proc(t$x)
的哪个元素对应于t
的哪一行的信息,并且当您unlist
时该信息会丢失。
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
proc <- function(text) {
text %>%
str_split(pattern = ",") %>%
lapply(function(x) {
x %in% c("a", "b")
})
}
t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
t %>%
mutate(x = proc(x)) %>%
unnest(x)
#> # A tibble: 4 × 2
#> x v
#> <lgl> <dbl>
#> 1 TRUE 1
#> 2 TRUE 1
#> 3 FALSE 2
#> 4 FALSE 2
Created on 2022-02-20 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-02-20
But, if you're going to use two functions anyway ( mutate
and unnest
), you may as well just use separate_rows
and then mutate
.但是,如果您无论如何都要使用两个函数(
mutate
和unnest
),您也可以只使用separate_rows
然后再使用mutate
。
Or, you could pack everything into the proc
function.或者,您可以将所有内容打包到
proc
function 中。
library(tidyr)
library(stringr)
library(dplyr, warn.conflicts = FALSE)
proc <- function(df, col) {
fun <- function(text) {
text %>%
str_split(pattern = ",") %>%
lapply(function(x) {
x %in% c("a", "b")
})
}
df %>%
mutate(across({{ col }}, fun)) %>%
unnest({{ col }})
}
t <- tibble(x = c("a,b", "c,d"), v = c(1,2))
t %>%
proc(x)
#> # A tibble: 4 × 2
#> x v
#> <lgl> <dbl>
#> 1 TRUE 1
#> 2 TRUE 1
#> 3 FALSE 2
#> 4 FALSE 2
Created on 2022-02-20 by the reprex package (v2.0.1)由reprex package (v2.0.1) 创建于 2022-02-20
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.