gsub replacing string with pattern matching code and not specific string variables

Question

I have a long list of files that I want to standardize. Different components of the string are separated by an underscore. However, a large number of files were created without the underscore between the digits (a unique id) and the single alpha character. The specific variables will be different per file but the pattern is the same. How do I add the _ in?

I have tried gsub . It picks up the pattern correctly (only changes strings that need the change) but the replacement is the pattern matching code.

x<- c("A12_SITE_1234_J_vvv.csv","A12_SITA_1234J_vvv.csv", "A12_SITE_1678_H_vvv.csv", "A12_SITE_145C_vvv.csv")

z<- gsub(".*[0-9][A-Z]", ".*[0-9]\\_[A-Z]", x)

expected results:

"A12_SITE_1234_J_vvv.csv","A12_SITA_1234_J_vvv.csv", "A12_SITE_1678_H_vvv.csv", "A12_SITE_145_C_vvv.csv"

Current results:

"A12_SITE_1234_J_vvv.csv" ".*[0-9]_[A-Z]_vvv.csv"   "A12_SITE_1678_H_vvv.csv" ".*[0-9]_[A-Z]_vvv.csv"

Answer 1

We can use a regex lookaround

sub("(?<=[0-9])(?=[A-Z])", "_", x, perl = TRUE)
#[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv" 
#[3] "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"

Or with capture groups ( (..) ) to capture the pattern as a group and then in the replacement use the backreference ( \\1, \\2 ) of the captured group

sub("([0-9])([A-Z])", "\\1_\\2", x, perl = TRUE)

In the OP's code, the pattern .* (any characters) followed by a number ( [0-9] ) and a alphabet ( [AZ] ) is not captured, so it gets lost in the replacement. Also, in the replacement, if we use [0-9] , it will taken as literal strings

Answer 2

Use a capturing group with backrefences in the replacement pattern (note that replacement patterns cannot be regex patterns, you only use regex to search for some text):

> sub("(.*[0-9])([A-Z])", "\\1_\\2", x)
[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv" "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"

See the R online demo and the regex demo .

Pattern details

(.*[0-9]) - Group 1 ( \1 ): any 0+ chars as many as possible up to and inclusing a digit
([AZ]) - Group 2 ( \2 ): an uppercase ASCII letter.

gsub replacing string with pattern matching code and not specific string variables

Question

2 answers

solution1
3 ACCPTED 2019-04-30 13:11:07

solution2
3 2019-04-30 13:13:01

gsub replacing string with pattern matching code and not specific string variables

Question

2 answers

solution1 3 ACCPTED 2019-04-30 13:11:07

solution2 3 2019-04-30 13:13:01

solution1
3 ACCPTED 2019-04-30 13:11:07

solution2
3 2019-04-30 13:13:01