简体   繁体   English

正则表达式用于带有特殊字符的特定模式

[英]regex for specific pattern with special characters

I have the following in a data.frame in r: 我在r的data.frame中具有以下内容:

example <- "Inmuebles24_|.|_Casa_|.|_Renta_|.|_NuevoLeon"

I would like to simply use stringr count and some basic grexpr functions on the string, but i'm stuck on the regex. 我想简单地使用stringr计数及一些基本grexpr的字符串函数,但我被困在正则表达式。

The delimiter is clearly (and confusingly): _|.|_ 分隔符很明显(且令人困惑): _|.|_

How would this be expressed with regex? 正则表达式将如何表达呢?

Currently trying to escape everything to no success: 当前试图逃避一切以失败告终:

str_count(string = example, pattern = "[\\_\\|\\.\\|\\_]")

Your regex does not work because you placed it into a character class (where you do not need to escape _ , BTW). 您的正则表达式无法正常工作,因为您已将其置于字符类中(无需在其中转义_ ,BTW)。 See my today's answer to Regex expression not working with once or none for an explanation of the issue (mainly, the characters are treated as separate symbols and not as sequences of symbols, and all the special symbols are treated as literals, too). 有关此问题的解释,请参见我今天对Regex表达式不起作用一次或不起作用的答案(主要是,字符被视为单独的符号,而不是被视为符号序列,所有特殊符号也被视为文字。)

You can achieve what you want in two steps: 您可以通过两个步骤来实现所需的目标:

  • Trim the string from the delimiters with gsub 使用gsub修剪定界符中的字符串
  • Use str_count + 1 to get the count (as the number of parts = number of delimiters inside the string + 1) 使用str_count + 1获取计数(因为部分数=字符串内的定界符数+ 1)

R code: R代码:

example <- "_|.|_Inmuebles24_|.|_Casa_|.|_Renta_|.|_NuevoLeon_|.|_"
str_count(string = gsub("^(_[|][.][|]_)+|(_[|][.][|]_)+$", "", example), pattern = "_\\|\\.\\|_") + 1
## => 4

Or, in case you have multile consecutive delimiters, you need another intermediate step to "contract" them into 1: 或者,如果您有多个连续的定界符,则需要另一个中间步骤将它们“收缩”为1:

example <- "_|.|_Inmuebles24_|.|_Casa_|.|__|.|_Renta_|.|__|.|_NuevoLeon_|.|_"
example <- gsub("((_[|][.][|]_)+)", "_|.|_", example)
str_count(string = gsub("^(_[|][.][|]_)+|(_[|][.][|]_)+$", "", example), pattern = "_\\|\\.\\|_") + 1
## => 4

Notes on the regexps: _[|][.][|]_ matches _|.|_ literally as symbols in the [...] character classes lose their special meaning. 关于正则表达式的注意事项: _[|][.][|]_ _|.|_实际上与_|.|_匹配_|.|_ ,因为[...]字符类中的符号失去了其特殊含义。 ((_[|][.][|]_)+) (2) matches 1 or more ( + ) sequences of these delimiters. ((_[|][.][|]_)+) (2)匹配这些定界符的1个或更多( + )序列。 The ^(_[|][.][|]_)+|(_[|][.][|]_)+$ pattern matches 1 or more delimiters at the start ( ^ ) and end ( $ ) of the string. ^(_[|][.][|]_)+|(_[|][.][|]_)+$模式在1的开始( ^ )和结束( $ )处匹配1个或多个定界符字符串。

这为您提供了本示例的所需内容: str_count(example, "\\\\w+")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM