I need a regex
expert on this problem. It's linked to a SO question I've lost, where the data are the following:
x = c("IID:WE:G12D/V/A", "GH:SQ:p.R172W/G", "HH:WG:p.S122F/H")
I need to split each element of x
to isolate the end part which can be consituted of letter - slash - letter - .... slash - letter
. What I want is to obtain these two vectors as output:
o1 = c("IID:WE:G12", "GH:SQ:p.R172", "HH:WG:p.S122")
o2 = c("D/V/A", "W/G", "F/H")
I have this solution for o1
:
gsub('[A-Z]/.+','',x)
#[1] "IID:WE:G12" "GH:SQ:p.R172" "HH:WG:p.S122"
Good. For o2
, I tried to use assertion and particularly look-ahead assertion:
gsub('.+(?=[A-Z]/.+)','',x, perl=T)
#[1] "V/A" "W/G" "F/H"
But this is not the wanted result!
Any idea what is going wrong with the second regex?
As a possible solution, you can use the following replacement:
gsub('.*?([^/](?:/[^/])+)$','\\1',x, perl=T)
Or (if there must be a letter):
gsub('.*?([A-Z](?:/[A-Z])+)$','\\1',x, perl=T)
See IDEONE demo
.*?
- matches as few as possible characters other than a newline from the start ([^/](?:/[^/])+)
- a capturing group matching:
[^/]
- a character other than /
(or - if [AZ]
- any English uppercase character) (?:/[^/])+
- 1 or more sequences of /
and a character other than /
(or if you use [AZ]
, an uppercase letter). $
- end of string The following, very near to what you came up with, will work:
gsub('[^/]+(?=[AZ]/.+)','',x, perl=T)
(Your line didn't work because you were asking for "any character", which includes "\\")
Try this:
gsub('\\w\\/.*(\\/.*)?','',x)
Regex look ahead:
gsub('\\w(?=\\/).*','',x,perl=T)
gsub('.*\\d(?=\\w\\/)','',x, perl=T) #For O2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.