[英]Regex search for specific pattern, if found, replace with something else
I am currently trying to figure out how to use regex in order to clean up my textual data in R. I wonder where I could find an easy tutorial for it? 我目前正在尝试弄清如何使用正则表达式来清理R中的文本数据。我想知道在哪里可以找到简单的教程? I have been looking a bit online, but when I try something out on regex101 I hardly ever find matches.
我一直在网上找东西,但是当我在regex101上尝试一些东西时,我几乎找不到匹配的东西。 And if I do, within R, nothing changes.
如果我这样做,在R内什么也不会改变。 Consider this example
考虑这个例子
Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
After <- "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"
> Aftergsub <- gsub("\\([\\d][\\d][\\d][\\d]\\)", "new", "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)")
> print(Aftergsub)
[1] "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
>
Of course the "new" should be an expression that would make Before look like After. 当然,“新”应该是使“之前”看起来像“之后”的表达式。 But I don't even get to change Before into anything else, based on my pattern.
但是根据我的模式,我什至都无法将“之前”更改为其他任何内容。
In other words, how do I only change a ")" to a "," if it has been preceded by 4 digits? 换句话说,如果前面有4位数字,如何将“)”更改为“,”? Thanks!
谢谢!
Your pattern does not work because TRE regex flavor does not support shorthand character classes inside bracket expressions. 您的模式不起作用,因为TRE regex风味不支持方括号表达式内的速记字符类。 You should either use
[[:digit:]]
or [0-9]
, but not [\\\\d]
(that actually matches a \\
or a letter d
). 您应该使用
[[:digit:]]
或[0-9]
,但不要使用[\\\\d]
(实际上与\\
或字母d
匹配 )。
You may use 您可以使用
Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
gsub("\\((\\d{4})\\)", "\\1,", Before)
## => [1] "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"
See the R online demo 观看R在线演示
NOTE that I am using \\\\d
without square brackets (=bracket expression) around it. 注意 ,我使用的
\\\\d
周围没有方括号(=括号表达式)。 TRE regex engine treats "\\\\d{4}"
as a four digit matching pattern. TRE regex引擎将
"\\\\d{4}"
视为四位数的匹配模式。 It is equal to [0-9]{4}
or [[:digit:]]{4}
. 它等于
[0-9]{4}
或[[:digit:]]{4}
。
Details 细节
\\\\(
- a literal (
\\\\(
-文字(
(\\\\d{4})
- Group 1: any four digits (\\\\d{4})
-组1:任意四位数字 \\\\)
- a literal )
\\\\)
-文字)
\\\\1
- the backreference to Group 1 value \\\\1
对组1值的反向引用
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.