简体   繁体   English

正则表达式向前看断言

[英]Regex look ahead assertion

I need a regex expert on this problem. 我需要一个关于这个问题的regex专家。 It's linked to a SO question I've lost, where the data are the following: 它与我丢失的SO问题有关,数据如下:

x = c("IID:WE:G12D/V/A", "GH:SQ:p.R172W/G", "HH:WG:p.S122F/H")

I need to split each element of x to isolate the end part which can be consituted of letter - slash - letter - .... slash - letter . 我需要拆分x每个元素以隔离可以由letter - slash - letter - .... slash - letter构成的末尾部分。 What I want is to obtain these two vectors as output: 我想要的是获得这两个向量作为输出:

o1 = c("IID:WE:G12", "GH:SQ:p.R172", "HH:WG:p.S122")
o2 = c("D/V/A", "W/G", "F/H")

I have this solution for o1 : 我有这个o1解决方案:

gsub('[A-Z]/.+','',x)
#[1] "IID:WE:G12"   "GH:SQ:p.R172" "HH:WG:p.S122"

Good. 好。 For o2 , I tried to use assertion and particularly look-ahead assertion: 对于o2 ,我尝试使用断言,特别是前瞻断言:

gsub('.+(?=[A-Z]/.+)','',x, perl=T)
#[1] "V/A" "W/G" "F/H"

But this is not the wanted result! 但这不是想要的结果!

Any idea what is going wrong with the second regex? 知道第二个正则表达式出了什么问题吗?

As a possible solution, you can use the following replacement: 作为可能的解决方案,您可以使用以下替代品:

gsub('.*?([^/](?:/[^/])+)$','\\1',x, perl=T)

Or (if there must be a letter): 或者(如果必须有一封信):

gsub('.*?([A-Z](?:/[A-Z])+)$','\\1',x, perl=T)

See IDEONE demo 请参阅IDEONE演示

  • .*? - matches as few as possible characters other than a newline from the start - 从一开始就匹配尽可能少的换行符
  • ([^/](?:/[^/])+) - a capturing group matching: ([^/](?:/[^/])+) - 一个匹配的捕获组:
    • [^/] - a character other than / (or - if [AZ] - any English uppercase character) [^/] - 除/之外的字符(或 - 如果[AZ] - 任何英文大写字母)
    • (?:/[^/])+ - 1 or more sequences of / and a character other than / (or if you use [AZ] , an uppercase letter). (?:/[^/])+ - 1点以上的序列/及其他的字符比/ (或如果使用[AZ]大写字母)。
  • $ - end of string $ - 结束字符串

The following, very near to what you came up with, will work: 以下内容非常接近你提出的内容,它将起作用:

gsub('[^/]+(?=[AZ]/.+)','',x, perl=T)

(Your line didn't work because you were asking for "any character", which includes "\\") (你的行没有用,因为你要求“任何字符”,其中包括“\\”)

Try this: 尝试这个:

gsub('\\w\\/.*(\\/.*)?','',x)

Regex look ahead: 正则表达未来:

gsub('\\w(?=\\/).*','',x,perl=T)

gsub('.*\\d(?=\\w\\/)','',x, perl=T)  #For O2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM