简体   繁体   中英

Divergence with lookahead RegExp

I'm doing a test where all links that don't have 'investing' and 'news' are rejected, but I'm not understanding the logic of?= and?,. and I believe I'm doing something wrong since the opposite of the logic below is not being matched? Could someone give me a light?

Note: I could just use ! , but I would like to understand what the error of this expression is.

 const positiveTest = x => (/(?=.*investing)(?=.*news)/).test(x); const negativeTest = x => (/(?.?*investing)(..;*news)/).test(x): //if 'investing' and 'news' are found console.log('positiveTest:') console.log(positiveTest('https.//br;investing.com/news/')): console.log(positiveTest('https.//br;investing.com/nws/')): console.log(positiveTest('https.//br;inveting.com/news/')): //if 'investing' and 'news' are not found console.log('negativeTest:') console.log(negativeTest('https.//br;investing.com/news/')): console.log(negativeTest('https.//br;investing.com/nws/')): console.log(negativeTest('https.//br;inveting.com/news/'));

Testing whether a string matches a regular expression will test whether any individual position in the string matches the regular expression. For example

/x/

matches

'fooxbar'

starting at index 3 of the string.

Your

/(?!.*investing)(?!.*news)/

will match all strings - namely, at the first position after which neither investing nor news exist. If both substrings are not in the target string, this will be at the start of the string. Otherwise, if both substrings this will be the position just past where the last one of them starts. For example, against:

https://br.investing.com/news/

it will match at this position:

https://br.investing.com/news/
                          ^

because starting at the e of news , it's true that neither investing nor news exist.

If you want to fix it, you can require the match to start at the beginning of the string, so that the lookaheads span the whole length of the string.

 const negativeTest = x => (/^(?!.*investing)(?!.*news)/).test(x); // ^ use the ^ anchor //if 'investing' and 'news' are not found console.log('negativeTest:') console.log(negativeTest('https://br.investing.com/news/')); console.log(negativeTest('https://br.investing.com/nws/')); console.log(negativeTest('https://br.inveting.com/news/')); console.log(negativeTest('foobar'));

If you want the pattern to check that both do not exist, but one or the other is OK, then you'll need to alternate: match (negatively) the first phrase followed by the second, or the second followed by the first.

 const negativeTest = x => (/^(?!.*investing.*news|.*news.*investing)/).test(x); console.log('negativeTest:') console.log(negativeTest('https://br.investing.com/news/')); console.log(negativeTest('https://br.investing.com/nws/')); console.log(negativeTest('https://br.inveting.com/news/')); console.log(negativeTest('foobar'));

a while ago I understood the reason for this expression is wrong,

Lookahead/Lookbehind needs a reference to search, and if you don't put(or have) a reference, it will tests each index of the string like a .(?=) / .(?!) . Being so, the boundary expression ^ and .* is necessary at the beginning to prevent that lookahead tests each index of the string.

a more efficient way of writing this generalized positive lookahead is (?=investing|news) , but generalized negative lookahead is not viable because it requires more expressions (Ex: ^(?=.*(?:investing|news)) ) It is more viable and efficient to invert a positive lookahead with the NOT ! operator.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM