简体   繁体   English

正则表达式搜索特定模式,如果找到,请替换为其他模式

[英]Regex search for specific pattern, if found, replace with something else

I am currently trying to figure out how to use regex in order to clean up my textual data in R. I wonder where I could find an easy tutorial for it? 我目前正在尝试弄清如何使用正则表达式来清理R中的文本数据。我想知道在哪里可以找到简单的教程? I have been looking a bit online, but when I try something out on regex101 I hardly ever find matches. 我一直在网上找东西,但是当我在regex101上尝试一些东西时,我几乎找不到匹配的东西。 And if I do, within R, nothing changes. 如果我这样做,在R内什么也不会改变。 Consider this example 考虑这个例子

Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
After <- "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"


> Aftergsub <- gsub("\\([\\d][\\d][\\d][\\d]\\)", "new", "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)")
> print(Aftergsub)
[1] "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
> 

Of course the "new" should be an expression that would make Before look like After. 当然,“新”应该是使“之前”看起来像“之后”的表达式。 But I don't even get to change Before into anything else, based on my pattern. 但是根据我的模式,我什至都无法将“之前”更改为其他任何内容。

In other words, how do I only change a ")" to a "," if it has been preceded by 4 digits? 换句话说,如果前面有4位数字,如何将“)”更改为“,”? Thanks! 谢谢!

Your pattern does not work because TRE regex flavor does not support shorthand character classes inside bracket expressions. 您的模式不起作用,因为TRE regex风味不支持方括号表达式内的速记字符类。 You should either use [[:digit:]] or [0-9] , but not [\\\\d] (that actually matches a \\ or a letter d ). 您应该使用[[:digit:]][0-9] ,但不要使用[\\\\d] (实际上\\或字母d匹配 )。

You may use 您可以使用

Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
gsub("\\((\\d{4})\\)", "\\1,", Before)
## => [1] "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"

See the R online demo 观看R在线演示

NOTE that I am using \\\\d without square brackets (=bracket expression) around it. 注意 ,我使用的\\\\d周围没有方括号(=括号表达式)。 TRE regex engine treats "\\\\d{4}" as a four digit matching pattern. TRE regex引擎将"\\\\d{4}"视为四位数的匹配模式。 It is equal to [0-9]{4} or [[:digit:]]{4} . 它等于[0-9]{4}[[:digit:]]{4}

Details 细节

  • \\\\( - a literal ( \\\\( -文字(
  • (\\\\d{4}) - Group 1: any four digits (\\\\d{4}) -组1:任意四位数字
  • \\\\) - a literal ) \\\\) -文字)
  • \\\\1 - the backreference to Group 1 value \\\\1对组1值的反向引用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM