简体   繁体   中英

Regex look ahead assertion

I need a regex expert on this problem. It's linked to a SO question I've lost, where the data are the following:

x = c("IID:WE:G12D/V/A", "GH:SQ:p.R172W/G", "HH:WG:p.S122F/H")

I need to split each element of x to isolate the end part which can be consituted of letter - slash - letter - .... slash - letter . What I want is to obtain these two vectors as output:

o1 = c("IID:WE:G12", "GH:SQ:p.R172", "HH:WG:p.S122")
o2 = c("D/V/A", "W/G", "F/H")

I have this solution for o1 :

gsub('[A-Z]/.+','',x)
#[1] "IID:WE:G12"   "GH:SQ:p.R172" "HH:WG:p.S122"

Good. For o2 , I tried to use assertion and particularly look-ahead assertion:

gsub('.+(?=[A-Z]/.+)','',x, perl=T)
#[1] "V/A" "W/G" "F/H"

But this is not the wanted result!

Any idea what is going wrong with the second regex?

As a possible solution, you can use the following replacement:

gsub('.*?([^/](?:/[^/])+)$','\\1',x, perl=T)

Or (if there must be a letter):

gsub('.*?([A-Z](?:/[A-Z])+)$','\\1',x, perl=T)

See IDEONE demo

  • .*? - matches as few as possible characters other than a newline from the start
  • ([^/](?:/[^/])+) - a capturing group matching:
    • [^/] - a character other than / (or - if [AZ] - any English uppercase character)
    • (?:/[^/])+ - 1 or more sequences of / and a character other than / (or if you use [AZ] , an uppercase letter).
  • $ - end of string

The following, very near to what you came up with, will work:

gsub('[^/]+(?=[AZ]/.+)','',x, perl=T)

(Your line didn't work because you were asking for "any character", which includes "\\")

Try this:

gsub('\\w\\/.*(\\/.*)?','',x)

Regex look ahead:

gsub('\\w(?=\\/).*','',x,perl=T)

gsub('.*\\d(?=\\w\\/)','',x, perl=T)  #For O2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM