简体   繁体   中英

How to return matched regex in R gsub

This is pretty basic, but I can't seem to find how to return the already-matched expression in regexes in R.

For example, suppose I wanted to add a period after an initial, for example in changing "Joe J Smith" to "Joe J. Smith" .

My approach is to use gsub("(?<=\\\\s|^)[AZ](?=\\\\S|$)","\\\\1.",string,perl=T) . (I'm no expert on regex, but I thought \\\\1 or $1 would return the matched expression, ie "J" for the string given.

For nought, though, as this returns: "Joe . Smith"

I'm sure this is simple, but I can't find any examples trying to do something similar in R, which has its own brand of base regex.

In this case you can use "\\\\b" to refer to word boundaries:

> gsub("\\b([A-Z])\\b", "\\1.", "Joe J Smith")
[1] "Joe J. Smith"

Regarding capitalizing the letter after a hyphen:

> gsub("(-.)", "\\U\\1", "Joe Jones-smith", perl = TRUE)
[1] "Joe Jones-Smith"

Like akrun indicated, you need to parenthetise the capital letter to form a group. This is what ?regex says:

     The backreference '\N', where 'N = 1 ... 9', matches the substring
     previously matched by the Nth parenthesized subexpression of the
     regular expression.  (This is an extension for extended regular
     expressions: POSIX defines them only for basic ones.)

Adding the parens gives this example:

R>x
[1] "joe J smith"
R>gsub("(?<=\\s|^)([A-Z])(?=\\s|$)","\\1.",x,perl=TRUE)
[1] "joe J. smith"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM