I'm trying to match on the following text:
"abc" matches "b" and field[cba] = "cba" or (field[cba] matches "c") and "cc" = "bb"
the parts before and after "matches" into named groups.
I need to match "abc" as ${left}
and "b" as ${right}
, and then "field[cba]" / "c" on the second match.
I need to give bounds to ${left}
and ${right}
so that they break if:
Left:
" and "
, " or "
, "("
when not in doublequotes (") Right:
" and "
, " or "
, ")"
when not in doublequotes (") The replacement regex pattern I would like to use is:
RegExpMatch(${left}, ${right})
So to get the following output:
RegExpMatch("abc","b") and field[cba] = "cba" or (RegExpMatch(field[cba],"c")) and "cc" = "bb"
I tried with:
(?<=^|\\(| or | and )(?<left>.*?) matches (?<right>.*?)(?=\\)|$| and | or )
This has a couple of issues:
^
for start of string seems to make the lookbehind greedy and it captures from start of string even if there is an " or "
or " and "
before, which is weird because $
seems to work ok " or "
, " and "
, "("
or ")"
to match only when not in quotes (in a literal) Can you please help me in figuring out the correct regular pattern to apply?
The problem is it sees and
in your lookahead, and then you use .*?
(which will suck up everything until matches
: field[cba] = "cba" or (field[cba]
). We need a more strict definition of left/right, it can't just be "any character".
(?<=^|\(| or | and )(?<left>\S+) matches (?<right>\S+?)(?=\)|$| and | or )
I changed .*?
to \\S+
which matches anything but whitespace ( [^\\r\\n\\t\\f ]
). Now it won't suck up all the unnecessary characters in left/right capture groups. \\S+
may not be the right definition for you, but it should get you started.
Demo: Regex101
I'm not entirely sure how your data is, but I suggest this regex, which is independent of the bounds:
(?:(?<left>"[^"]*")|\b(?<left>\S*)) matches (?:(?<right>"[^"]*")|(?<right>\S*[^)\s]))
I'm exploiting the fact that C# allows captures with the same name here. The left and right parts are almost the same.
(?: => Non-capture group
(?<left> => Left capture begin
"[^"]*" => Double quotes, non-quote characters then double quotes
) => End left capture
| => OR
\b => Word boundary
(?<left> => Begin other left capture if first failed
\S* => Capture non-space characters (if your parts break on multiple lines, you can use [^"]* instead
) => End left capture
) => End non-capture group
regex101 demo (I changed the named captures because PCRE doesn't support same name capture groups)
If the word boundary is causing problems (eg when you have a part that doesn't start with "
or a \\w
character, you might use the following regex instead:
(?:(?<left>"[^"]*")|\s\(?(?<left>\S*)) matches (?:(?<right>"[^"]*")|(?<right>\S*[^)\s]))
Which is using \\s\\(?
instead of the \\b
If you want to stick to the bounds you mentioned, you will have to know what exactly can be in the parts or what cannot. For instance, if
field["abc"] in field matches field["cba"] in field
is valid and the parts are field["abc"] in field
and field["cba"] in field
respectively, then it's another complication.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.