简体   繁体   中英

gsub back reference and replacement with empty string not identical

How can I back reference file_version_1a.csv in the following?

vec = c("dir/file_version_1a.csv")

In particular, I wonder why

gsub("(file.*csv$)", "", vec)
[1] "dir/"

as if I have a correct pattern, yet

gsub("(file.*csv$)", "\\1", vec)
[1] "dir/file_version_1a.csv"

You want to extract the substring starting with file and ending with csv at the end of string.

Since gsub replaces the match, and you want to use it as an extraction function, you need to match all the text in the string.

As the text not matched with your regex is at the start of the string, you need to prepend your pattern with .* (this matches any zero or more chars, as many as possible, if you use TRE regex in base R functions, and any zero or more chars other than line break chars in PCRE/ICU regexps used in perl=TRUE powered base R functions and stringr / stringi functions):

vec = c("dir/file_version_1a.csv")
gsub(".*(file.*csv)$", "\\1", vec)

However, stringr::str_extract seems a more natural choice here:

stringr::str_extract(vec, "file.*csv$")
regmatches(vec, regexpr("file.*csv$",vec))

See the R demo online .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM