I have a long list of files that I want to standardize. Different components of the string are separated by an underscore. However, a large number of files were created without the underscore between the digits (a unique id) and the single alpha character. The specific variables will be different per file but the pattern is the same. How do I add the _
in?
I have tried gsub
. It picks up the pattern correctly (only changes strings that need the change) but the replacement is the pattern matching code.
x<- c("A12_SITE_1234_J_vvv.csv","A12_SITA_1234J_vvv.csv", "A12_SITE_1678_H_vvv.csv", "A12_SITE_145C_vvv.csv")
z<- gsub(".*[0-9][A-Z]", ".*[0-9]\\_[A-Z]", x)
expected results:
"A12_SITE_1234_J_vvv.csv","A12_SITA_1234_J_vvv.csv", "A12_SITE_1678_H_vvv.csv", "A12_SITE_145_C_vvv.csv"
Current results:
"A12_SITE_1234_J_vvv.csv" ".*[0-9]_[A-Z]_vvv.csv" "A12_SITE_1678_H_vvv.csv" ".*[0-9]_[A-Z]_vvv.csv"
We can use a regex lookaround
sub("(?<=[0-9])(?=[A-Z])", "_", x, perl = TRUE)
#[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv"
#[3] "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"
Or with capture groups ( (..)
) to capture the pattern as a group and then in the replacement use the backreference ( \\1, \\2
) of the captured group
sub("([0-9])([A-Z])", "\\1_\\2", x, perl = TRUE)
In the OP's code, the pattern .*
(any characters) followed by a number ( [0-9]
) and a alphabet ( [AZ]
) is not captured, so it gets lost in the replacement. Also, in the replacement, if we use [0-9]
, it will taken as literal strings
Use a capturing group with backrefences in the replacement pattern (note that replacement patterns cannot be regex patterns, you only use regex to search for some text):
> sub("(.*[0-9])([A-Z])", "\\1_\\2", x)
[1] "A12_SITE_1234_J_vvv.csv" "A12_SITA_1234_J_vvv.csv" "A12_SITE_1678_H_vvv.csv" "A12_SITE_145_C_vvv.csv"
See the R online demo and the regex demo .
Pattern details
(.*[0-9])
- Group 1 ( \1
): any 0+ chars as many as possible up to and inclusing a digit ([AZ])
- Group 2 ( \2
): an uppercase ASCII letter.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.