I am currently trying to figure out how to use regex in order to clean up my textual data in R. I wonder where I could find an easy tutorial for it? I have been looking a bit online, but when I try something out on regex101 I hardly ever find matches. And if I do, within R, nothing changes. Consider this example
Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
After <- "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"
> Aftergsub <- gsub("\\([\\d][\\d][\\d][\\d]\\)", "new", "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)")
> print(Aftergsub)
[1] "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
>
Of course the "new" should be an expression that would make Before look like After. But I don't even get to change Before into anything else, based on my pattern.
In other words, how do I only change a ")" to a "," if it has been preceded by 4 digits? Thanks!
Your pattern does not work because TRE regex flavor does not support shorthand character classes inside bracket expressions. You should either use [[:digit:]]
or [0-9]
, but not [\\\\d]
(that actually matches a \\
or a letter d
).
You may use
Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
gsub("\\((\\d{4})\\)", "\\1,", Before)
## => [1] "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"
See the R online demo
NOTE that I am using \\\\d
without square brackets (=bracket expression) around it. TRE regex engine treats "\\\\d{4}"
as a four digit matching pattern. It is equal to [0-9]{4}
or [[:digit:]]{4}
.
Details
\\\\(
- a literal (
(\\\\d{4})
- Group 1: any four digits \\\\)
- a literal )
\\\\1
- the backreference to Group 1 value
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.