简体   繁体   中英

RegEx to Match Characters Directly Before Keyword and Directly Afterwards

I'm not good enough with RegEx yet. I've been searching around and trying to write my own, and haven't succeeded. I want to search through a string

Shelf-15-Contains(Item)10-Depo91

I want to search for (), which can be done by

/\(([^()]+)\)/g

When the RegEx finds () I want to grab the 'stuff' that comes right before the (), the () and everything inside, and then whatever follows directly afterwards. So,

Contains(Item)10

EDIT: Also, the RegEx I have above makes sure that there aren't nested (), so once I figure out how to match what comes before and after I should be able to run this recursively if there were multiple nested layers?

怎么样:

/([^-]+\([^()]+\)[^-]+)/g

For matching before and after, use additional capturing groups:

while (
  $str
  =~ m/
        ([^-]*)          # before
        \( ( [^()]* ) \) # (in)
        (?= ([^-]*) )    # after
     /gx
) {
    my ($before, $in, $after) = ($1, $2, $3);
    ...
}

Nested constructs cannot be recognized by regular expressions in the strict sense (finite state machine accepting a string). Perl's regex engine offer additional constructions for recognizing balanced parentheses, but they are difficult rather to use.

http://perldoc.perl.org/perlre.html#Extended-Patterns gives examples how to parse balanced parentheses, at (??{ code }) and (?PARNO) .

Finally, the structure of the string you want to parse seems to be a - -separated list. Try to find a formal grammar for what you want to parse, it will help you to design your program.

If you don't need to handle a(b)c(d)e , then you can simplify (?= ([^-]*) ) to ([^-]*) .

IMHO, no need to overcomplicate here. Here is a regex that will match Contains , everything in the brackets (with or without nested ones, balanced or not), and the optional digits. It assumes that there are - s around this construction:

\w+\(.*?\)\d*(?=-|$)

See demo

Input:

Shelf-15-Contains(I(t)e(m))10-Depo91
Shelf-15-Contains(I(t)e(m))-Depo91

Matches:

Contains(I(t)e(m))10
Contains(I(t)e(m))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM