[英]How to assign vectors into multiple variables in dplyr mutate
I have the following data frame: 我有以下数据框:
library(tidyverse)
dat <-structure(list(motif_name_binned = c("Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1",
"Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2",
"Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3"
), motif_score = c(6.816695, 6.816695, 6.816695)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("motif_name_binned",
"motif_score"))
dat
Which gives this: 这给出了:
> dat
# A tibble: 3 x 2
motif_name_binned motif_score
<chr> <dbl>
1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1 6.816695
2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2 6.816695
3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3 6.816695
I can get what I want by extracting the value of motif_named_binned
using this code: 我可以通过使用以下代码提取
motif_named_binned
的值来获得所需的内容:
dat %>%
mutate(motif = str_match(motif_name_binned,"^(.*?)\\/.*?")[,2],
inst = str_match(motif_name_binned,"^.*?\\/.*?\\/.*?\\.instid_(.*?)\\.bin\\d+")[,2],
binno = as.integer(str_match(motif_name_binned,"^.*?\\/.*?\\/.*?\\.bin(\\d+)")[,2]))
Which gives 这使
# A tibble: 3 x 5
motif_name_binned motif_score motif inst binno
<chr> <dbl> <chr> <chr> <int>
1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1 6.816695 Ddit3::Cebpa chr1:183286845-183287245 1
2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2 6.816695 Ddit3::Cebpa chr1:183286845-183287245 2
3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3 6.816695 Ddit3::Cebpa chr1:183286845-183287245 3
But notice that I have to execute the regex 3 times and assign it to a variable one by one. 但是请注意,我必须执行3次正则表达式,并将其一次一个地分配给变量。 Where in fact I can use single regex such as this:
实际上,我可以在其中使用单个正则表达式,例如:
str_match(motif_name_binned,"^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)")[,c(2,3,4)]
How I incorporate this later all-in-one regex in dplyr mutate()
? 我如何将这个后来的多合一正则表达式合并到dplyr
mutate()
?
You can use tidyr::extract
to convert the capturing groups in the regular expression into new columns: 您可以使用
tidyr::extract
将正则表达式中的捕获组转换为新列:
library(tidyr)
dat %>%
extract(motif_name_binned, c('motif', 'inst', 'binno'), regex = "^(.*?)\\/.*?\\/.*?\\.instid_(.*?)\\.bin(\\d+)", remove = FALSE)
# A tibble: 3 x 5
# motif_name_binned motif inst binno motif_score
#* <chr> <chr> <chr> <chr> <dbl>
#1 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin1 Ddit3::Cebpa chr1:183286845-183287245 1 6.816695
#2 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin2 Ddit3::Cebpa chr1:183286845-183287245 2 6.816695
#3 Ddit3::Cebpa/MA0019.1/Jaspar.instid_chr1:183286845-183287245.bin3 Ddit3::Cebpa chr1:183286845-183287245 3 6.816695
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.