简体   繁体   English

如何在R中使用str_split()?

[英]How to use str_split() in R?

I want to split this string in several substrings: 我想将此字符串拆分为几个子字符串:

BAA33520.2|/gene="vpf402",/product="Vpf402"|GI:8272373|AB012574|join{7347:7965, 0:591} BAA33520.2 | / gene =“ vpf402”,/ product =“ Vpf402” | GI:8272373 | AB012574 | join {7347:7965,0:591}

The separator is | 分隔符是| (ascii 124). (ascii 124)。
It works with all other separators but not with this one. 它适用于所有其他分隔符,但不适用于此分隔符。

?regex

Two regular expressions may be joined by the infix operator | infix运算符可以将两个正则表达式连接起来| ; ; the resulting regular expression matches any string matching either subexpression. 生成的正则表达式与任何匹配子表达式的字符串匹配。 For example, abba|cde matches either the string abba or the string cde . 例如, abba|cde与字符串abba或字符串cde Note that alternation does not work inside character classes, where | 请注意,替换在字符类中不起作用,其中| has its literal meaning. 有其字面意义。

The fundamental building blocks are the regular expressions that match a single character. 基本构件是与单个字符匹配的正则表达式。 Most characters, including all letters and digits, are regular expressions that match themselves. 大多数字符(包括所有字母和数字)都是匹配自己的正则表达式。 Any metacharacter with special meaning may be quoted by preceding it with a backslash. 具有特殊含义的任何元字符都可以在其前面加上反斜杠来引用。 The metacharacters in extended regular expressions are . \\ | ( ) [ { ^ $ * + ? 扩展正则表达式中的元字符为. \\ | ( ) [ { ^ $ * + ? . \\ | ( ) [ { ^ $ * + ? , but note that whether these have a special meaning depends on the context. ,但请注意,它们是否具有特殊含义取决于上下文。

Thus: 从而:

stringr::str_split('BAA33520.2|/gene="vpf402",/product="Vpf402"|GI:8272373|AB012574|join{7347:7965, 0:591}', "\\|")

As @ Frank noted, you can do this in base::strsplit() by adding the fixed=TRUE : 正如@ Frank指出的,您可以在base::strsplit()通过添加fixed=TRUE

strsplit('BAA33520.2|/gene="vpf402",/product="Vpf402"|GI:8272373|AB012574|join{‌​7347:7965, 0:591}',"|", fixed=TRUE)

However, you can also do this with stringr::str_split() by decorating the regular expression for the separator: 但是,您也可以使用stringr::str_split()来修饰分隔符的正则表达式:

stringr::str_split('BAA33520.2|/gene="vpf402",/product="Vpf402"|GI:8272373|AB012574|join{7347:7965, 0:591}', 
                   regex("|", literal=TRUE))

Incidentally, stringr is pretty much just a slightly friendlier wrapper to stringi functions at this point and I highly recommend studying the stringi package as it contains some wonderful gems outside of string spiltting. 顺便说一句, stringr是非常简单,只是一个稍微友好的包装来stringi功能,在这一点上,我强烈建议学习stringi包,因为它包含字符串spiltting外一些精彩的宝石。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM