简体   繁体   中英

Regex. Excluding matched pattern

I have a strings

name1;name2;name4; vs name3;name4;  
name3;name4; vs name1;name2;name4;
name3;name9; vs name1;name2;name8 vs name5;name4;name7;
name3;name1; vs name1;name2;name4;
name3;name1;name2 vs name1;name2;
name3;name5; vs name1;name2;name4;

How to fix regular expression to get first 3 strings using search condition:
name2 and name4 should be in different sides from vs ?

((.*name2.*;)|(.*name4.*;)).* vs .*?((.*name2.*;)|(.*name4.*;))

in result should be this

name1;name2;name4; vs name3;name4;  
name3;name4; vs name1;name2;name4;
name3;name9; vs name1;name2;name8 vs name5;name4;name7;

You might be looking for something like:

^.*\b(name2|name4)\b.* vs .*\b(?!\1)(name2|name4)\b.*$

See the online demo


  • ^.* - Start string ancor and match anything but newline zero or more times.
  • \b(name2|name4)\b - Match a word-boundary, 1st Capture group to match one out of two options and another word-boundary.
  • .* vs.* - Match anything up to and after literal ' vs '.
  • \b(?!\1) - Word boundary and negative lookahead for what is captured.
  • (name2|name4)\b - 2nd Capture group to get the other out of two options followed by another word boundary.
  • .*$ - Match anything up to end string ancor.

First, let me ensure I'm understanding your question. If we consider vs as the delimiter we have two or more groups/teams/sets of names. You want to match the lines in which name2 and name4 do not share a set? You have some inconsistency in your input example. Is there a solid specification for the format of the strings?

IE

name1;name2;name4; vs name3;name4;  *No match, name2 and name4 share set 1*
name3;name4; vs name1;name2;name4;  *No Match, name2 and name4 share set 2
name3;name9; vs name1;name2;name8 vs name5;name4;name7; *Match, name2 and name4 do not share a set
name3;name1; vs name1;name2;name4; *No match, name2 and name4 share set 2
name3;name1;name2 vs name1;name2; *Match? name2 and name4 do not share a set*

If my assumptions are correct, and your inputs are correctly inconsistent you could use something like:

^(?:(?:(?<!\S)(?:(?:name[0-35-9];?)+)(?: vs |(?!\S)))|(?:(?<!\S)(?:(?:name[0-13-9];?)+)(?: vs |(?!\S))))+$

We match on repeating groups that exclude either 2 or 4, where each name possibly ends in a semicolon, and is not surrounded by any non-whitespace character.

https://regex101.com/r/fiVP7Q/1 will explain the grouping much better than I may be able to.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM