简体   繁体   中英

Positive Lookbehind and Lookahead to the end of string

My string patterns looks like this: UNB+UNOC:3+4399945681577+_GLN_Company__+180101:0050+10870 and I am trying to extract everything after the second last + , ie 180101:0050+10870 .

Thus far, I managed to address the second last block 180101:0050 with this expression (?<=\\+)[^\\+]+(?=\\+[^\\+]*$) but fail to include the last block including the last +. Here is my sample: regex101

The expression is meant for R and I still need to escape the characters later on. This format it just for testing purposes in Regex101.

We could capture group based on the occurrence of + from the end ( $ ) of the string.

sub(".*\\+([^+]+\\+[^+]+$)", "\\1", str1)
#[1] "180101:0050+10870"

data

str1 <- "UNB+UNOC:3+4399945681577+_GLN_Company__+180101:0050+10870"

You may use

\+\K[^+]+\+[^+]*$

Or, if you would like to use it with stringr::str_extract :

(?<=\+)[^+]+\+[^+]*$

See the regex demo . Details:

  • \\+ - a + char
  • \\K - match reset operator
  • (?<=\\+) - location right after a + symbol
  • [^+]+ - one or more chars other than +
  • \\+ - a +
  • [^+]+ - one or more chars other than +
  • $ - end of string.

See R demo online :

x <- "UNB+UNOC:3+4399945681577+_GLN_Company__+180101:0050+10870"
regmatches(x, regexpr("\\+\\K[^+]+\\+[^+]*$", x, perl=TRUE))
## => [1] "180101:0050+10870"
library(stringr)
str_extract(x, "(?<=\\+)[^+]+\\+[^+]*$")
## => [1] "180101:0050+10870"

Another way you can do in this case:

library(stringr)
str_extract("UNB+UNOC:3+4399945681577+_GLN_Company__+180101:0050+10870", "\\d+:\\d+\\+\\d+")
#"180101:0050+10870"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM