简体   繁体   中英

Using gsub to replace last occurence of string in R

I have the following character vector than I need to modify with gsub .

strings <- c("x", "pm2.5.median", "rmin.10000m", "rmin.2500m", "rmax.5000m")

Desired output of filtered strings :

"x", "pm2.5.median", "rmin", "rmin", "rmax"

My current attempt works for everything except the pm2.5.median string which has dots that need to be preserved. I'm really just trying to remove the buffer size that is appended to the end of each variable, eg 1000m , 2500m , 5000m , 7500m , and 10000m .

gsub("\\..*m$", "", strings)
"x", "pm2", "rmin", "rmin", "rmax"

Match a dot, any number of digits, m and the end of string and replace that with the empty string. Note that we prefer sub to gsub here because we are only interested in one replacement per string.

sub("\\.\\d+m$", "", strings)
## [1] "x"            "pm2.5.median" "rmin"         "rmin"         "rmax"   

The .* pattern matches any 0 or more chars, as many as possible. The \\..*m$ pattern matches the first (leftmost) . in the string and then grab all the text after it if it ends with m .

You need

> sub("\\.[^.]*m$", "", strings)
[1] "x"            "pm2.5.median" "rmin"         "rmin"         "rmax" 

Here, \\.[^.]*m$ matches . , then 0 or more chars other than a dot and then m at the end of the string.

See the regex demo .

Details

  • \\. - a dot (must be escaped since it is a special regex char otherwise)
  • [^.]* - a negated character class matching any char but . 0 or more times
  • m - an m char
  • $ - end of string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM