简体   繁体   中英

Regex search for specific pattern, if found, replace with something else

I am currently trying to figure out how to use regex in order to clean up my textual data in R. I wonder where I could find an easy tutorial for it? I have been looking a bit online, but when I try something out on regex101 I hardly ever find matches. And if I do, within R, nothing changes. Consider this example

Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
After <- "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"


> Aftergsub <- gsub("\\([\\d][\\d][\\d][\\d]\\)", "new", "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)")
> print(Aftergsub)
[1] "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
> 

Of course the "new" should be an expression that would make Before look like After. But I don't even get to change Before into anything else, based on my pattern.

In other words, how do I only change a ")" to a "," if it has been preceded by 4 digits? Thanks!

Your pattern does not work because TRE regex flavor does not support shorthand character classes inside bracket expressions. You should either use [[:digit:]] or [0-9] , but not [\\\\d] (that actually matches a \\ or a letter d ).

You may use

Before <- "ACEMOGLU, D., ROBINSON, J., (2012) WHY NATIONS FAIL, (3)"
gsub("\\((\\d{4})\\)", "\\1,", Before)
## => [1] "ACEMOGLU, D., ROBINSON, J., 2012, WHY NATIONS FAIL, (3)"

See the R online demo

NOTE that I am using \\\\d without square brackets (=bracket expression) around it. TRE regex engine treats "\\\\d{4}" as a four digit matching pattern. It is equal to [0-9]{4} or [[:digit:]]{4} .

Details

  • \\\\( - a literal (
  • (\\\\d{4}) - Group 1: any four digits
  • \\\\) - a literal )
  • \\\\1 - the backreference to Group 1 value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM