简体   繁体   中英

Regex: Match patterns except with pattern preceding

I am attempting to write a regular expression to match certain patterns except for those with a preceding pattern. In other words given the following sentence:

Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8

I would like to match all XXXX that does not have the word paragraph in front of it, ie it should only match 5.6.7.8 . My current regex as such seems to match both 1.2.3.4 and 5.6.7.8. I have switched around the lookaheads but doesn't seem to match my use case.

(?<!paragraph)(?:[\(\)0-9a-zA-Z]+\.)+[\(\)0-9a-zA-Z]+

I code in javascript.

EDIT: Note that XXXX are not fixed at 4 X s. They range from XX to XXXXX

Your pattern matches because "paragraph" is not the same as "paragraph[space]". Your pattern doesn't have a space. Your text does.

You may want to add the space (perhaps conditionally?) to your lookbehind. Because you want to match a varying number of XXXX (you've said XX through XXXXX ), we need to include X. in the lookbehind as well:

const rex = /(?<!paragraph *(?:[\(\)0-9a-zA-Z]+\.)*)(?:[\(\)0-9a-zA-Z]+\.){1,4}[\(\)0-9a-zA-Z]/i;

Live Example:

 function test(str) { const rex = /(?<!paragraph *(?:[\\(\\)0-9a-zA-Z]+\\.)*)(?:[\\(\\)0-9a-zA-Z]+\\.){1,4}[\\(\\)0-9a-zA-Z]/i; const match = rex.exec(str); console.log(match ? match[0] : "No match"); } console.log("Testing four 'digits':"); test("Don't want to match paragraph 1.2.3.4 but this instead 5.6.7.8 blah"); console.log("Testing two 'digits':"); test("Don't want to match paragraph 1.2.3.4 but this instead 5.6 blah"); console.log("Testing two 'digits' again:"); test("Don't want to match paragraph 1.2 but this instead 5.6 blah"); console.log("Testing five 'digits' again:"); test("Don't want to match paragraph 1.2 but this instead 5.6.7.8.9 blah");

That expression requires:

  • That paragraph followed by zero or more spaces possibly followed by X. zer or more times is not immediately prior to the match; and
  • That X. is repeated one to four times ( {1,4} ); and
  • That X immediately follows those three

X in my example is A-Z0-9 and I've made the expression case-insensitive, but you can tweak as needed.


Note that lookbehind was only added to JavaScript recently, in ES2018, so support requires up-to-date JavaScript environments. If you need lookbehind on older environments, you might check out Steven Levithan's excellent XRegex library .

Also note that variable-length lookbehind like the above is not supported in all languages (but is supported in JavaScript...in engines that are up-to-date).

如果你总是想匹配一个 4-item 的组,你可以这样做:

(?<!paragraph )([0-9]+.?){4}

You can build the Regex iteratively -

  1. Ignore any word with preceding with the word 'paragraph' and a white-space.
  2. Since your pattern is fixed which will consists of a quadruple of numbers seperated by a period its safe to assume that the minimum number of digits in that quadruple will be 1.
  3. Capture the quadruple of numbers in a capturing group to be used later.

Test regex here .

 const inputData = 'Don\\'t want to match paragraph 1.2.3.4 but this instead 5.6.7.8 and 12.2.333.2'; const re = /(?<!paragraph\\s+)(\\d{1,}\\.\\d{1,}\\.\\d{1,}\\.\\d{1,})/ig; const matchedGroups = inputData.matchAll(re); for (const matchedGroup of matchedGroups) { console.log(matchedGroup); }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM