[英]How to split text into multiple columns?
I have a column that has text in the format:我有一列包含以下格式的文本:
ID-XXXXX Process for Description [1/5]
I would like this to be broken into three columns where:我希望将其分为三列,其中:
A = ID-XXXXX
B = Process for Description
C = 1/5
Any ideas on how to split this properly?关于如何正确分割的任何想法?
here is an attempt to help you.这是帮助您的尝试。 Mind that the first part is a bit tricky and I used a regex with the idea that
XXXXX
will always be 5 character long.请注意,第一部分有点棘手,我使用了一个正则表达式,认为
XXXXX
将始终为 5 个字符长。
d = "ID-XXXXX Process for Description [1/5]"
a =sub('[ ].+',"",d)
c = sub('.+[ ][[]',"",d) ; c = sub('[]]',"",c)
b = sub('[ ][[].*[]]',"",d) ;b = gsub('ID-.{5}[ ]',"",b)
f = c(a,b,c) ; f
# [1] "ID-XXXXX" "Process for Description" "1/5"
Using stringr
, there are several options:使用
stringr
,有几个选项:
dat <- data.frame(my_string = "ID-XXXXX Process for Description [1/5]")
dat %>%
mutate(A = str_extract(string = my_string, pattern = "ID-.{5}"),
B = str_replace(string = my_string, pattern = "ID-.{5}\\s(.+)\\s\\[.*\\]", replacement = "\\1"),
C = str_match(string = my_string, pattern = "\\[(.*)\\]")[2])
A : extract the following pattern : ID-
followed by exactly 5 characters A:提取以下模式:
ID-
后跟正好5个字符
B : captures the group between ID-XXXXX
and [XX]
, and replace the entire pattern with the captured pattern B : 捕获
ID-XXXXX
和[XX]
之间的组,并用捕获的模式替换整个模式
C : matches the captured pattern (.*)
between the squared brackets (the 2nd column of str_match
returns the captured pattern) C : 匹配方括号之间的捕获模式
(.*)
( str_match
的第 2 列返回捕获的模式)
Result:结果:
my_string A B C
1 ID-XXXXX Process for Description [1/5] ID-XXXXX Process for Description 1/5
EDIT :编辑:
I just remembered that the extract()
function from tidyr
does exactly that.我只记得
tidyr
的extract()
函数就是tidyr
做的。
Using capturing groups between parenthesis in the regex
argument, you get these into new columns directly.使用
regex
参数中括号之间的捕获组,您可以直接将这些组放入新列中。
dat <- data.frame(my_string = paste0("ID-0000", 1:5, " Process_", LETTERS[1:5], " [", 1:5, "/5]"))
extract(data = dat,
col = my_string,
into = c("A", "B", "C"),
regex = "(ID-.{5})\\s(.+)\\s\\[(.*)\\]",
remove = FALSE)
my_string A B C
1 ID-00001 Process_A [1/5] ID-00001 Process_A 1/5
2 ID-00002 Process_B [2/5] ID-00002 Process_B 2/5
3 ID-00003 Process_C [3/5] ID-00003 Process_C 3/5
4 ID-00004 Process_D [4/5] ID-00004 Process_D 4/5
5 ID-00005 Process_E [5/5] ID-00005 Process_E 5/5
If you don't want to keep the original string, use remove = TRUE
.如果您不想保留原始字符串,请使用
remove = TRUE
。
You may also use tidyr::extract
to do this sytematically.您也可以使用
tidyr::extract
来系统地执行此操作。 Example elaborated for purpose of demonstration-为演示目的而详细说明的示例-
[
into second capture group[
所有内容提取到第二个捕获组中]
into third capture group]
到第三个捕获组This way you don't have limitation in number of characters per capture group.这样您就不会限制每个捕获组的字符数。
vec <- c("ID-XXXXX Process for Description [1/5]", "ID-XXXXXYZ Process for Description something [1/5]", "ID-XXXXXFFF Process for Description something else [1/905]", "ID-XXXXXYYYYP Process for Description [900001/5]")
df <- data.frame(col = vec)
df
#> col
#> 1 ID-XXXXX Process for Description [1/5]
#> 2 ID-XXXXXYZ Process for Description something [1/5]
#> 3 ID-XXXXXFFF Process for Description something else [1/905]
#> 4 ID-XXXXXYYYYP Process for Description [900001/5]
library(tidyverse)
df %>%
extract(col, into = c('A', 'B', 'C'), regex = '^([^\\s]*)\\s([^\\[]*)\\[([^\\]]*)\\]$')
#> A B C
#> 1 ID-XXXXX Process for Description 1/5
#> 2 ID-XXXXXYZ Process for Description something 1/5
#> 3 ID-XXXXXFFF Process for Description something else 1/905
#> 4 ID-XXXXXYYYYP Process for Description 900001/5
Created on 2021-05-30 by the reprex package (v2.0.0)由reprex 包( v2.0.0 ) 于 2021 年 5 月 30 日创建
We could use str_extract
我们可以使用
str_extract
df %>%
mutate(A = str_extract(col1, "ID-XXXX"),
B = str_extract(col1, "Process for Description"),
C = str_extract(col1, "\\[1\\/5\\]"))
Output:输出:
# A tibble: 1 x 4
col1 A B C
<chr> <chr> <chr> <chr>
1 ID-XXXXX Process for Description [1/5] ID-XXXX Process for Description [1/5]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.