简体   繁体   English

查找关键字匹配项,但根据ReGex JS中的单词接近度忽略

[英]Find keyword matches, but ignore based on word proximity in ReGex JS

I'm trying to find matches for a word in a long string, however I want to set up a proximity around the first match, so that any words that match within the proximity get ignored. 我正在尝试为长字符串中的单词找到匹配项,但是我想在第一个匹配项附近设置一个接近度,以便忽略该接近度内匹配的所有单词。

For example, if I had an example string, where I'm looking for test : 例如,如果我有一个示例字符串,则在其中寻找test

Lorem ipsum Test sit amet, consectetur adipiscing elit. 
Vestibulum at erat ac enim malesuada pulvinar et nec ante. 
Cras erat ipsum, pellentesque vel volutpat ut, Test eu test. 
Test Quisque tincidunt varius mi.

And this example uses a proximity of 15 words , my end result would show these highlighted: 这个示例使用了约15 words ,我的最终结果将突出显示以下内容:

Lorem ipsum **Test** sit amet, consectetur adipiscing elit. 
Vestibulum at erat ac enim malesuada pulvinar et nec ante. 
Cras erat ipsum, pellentesque vel volutpat ut, **Test** eu test. 
Test Quisque tincidunt varius mi.

So it only finds the Test that is first && greater than 15 words away. 因此,它只会找到第一个&&大于15个单词的Test


So far I have tried something similar to this: 到目前为止,我已经尝试过类似的操作:

\\btest\\W+(?:\\w+\\W+){15,}?test\\b

But this seems to highlight all the words between, when I really only want to highlight test . 但这似乎突出了之间的所有单词,当时我真的只想突出test It also requires me to set 2 params of keywords, which I'd like to only have to use the test keyword once if possible. 它还要求我设置2个关键字参数,如果可能的话,我只想使用一次test关键字。

Any ideas on how I could accomplish this sort of proximity behavior? 关于如何完成这种接近行为的任何想法吗?


Clarification update: 澄清更新:

I have an example on regex tester here: https://regex101.com/r/FDOWZU/1 You can see that it selects the entire amount of words between instances of test . 我在这里有一个正则表达式测试器的示例: https : //regex101.com/r/FDOWZU/1您可以看到它选择了test实例之间的全部单词。 Current output 电流输出

However, what I want is something more like this: Expected output 但是,我想要的是这样的: 预期的输出

Not sure if you mean >=15 or >15 since your code and written logic contradict each other. 不确定您的意思是>=15还是>15因为您的代码和书面逻辑相互矛盾。 In any case, you can replace 14 with the number of words sought after. 无论如何,您都可以将14替换为所要求的单词数。 The upper hand 14 in this case ensures test isn't one of the next 15 words, so it will match test only if the next 15 words are not test . 在这种情况下,上半部14确保test不是接下来的15个单词之一,因此只有在接下来的15个单词不是test ,它才会与test匹配。


You can use the following regex: 您可以使用以下正则表达式:

See regex in use here 查看正则表达式在这里使用

\btest(?!\W+(?:\w+\W+){0,14}test)

 s = `Lorem ipsum Test sit amet, consectetur adipiscing elit. Vestibulum at erat ac enim malesuada pulvinar et nec ante. Cras erat ipsum, pellentesque vel volutpat ut, Test eu test. Test Quisque tincidunt varius mi. Suspendisse vitae lobortis diam. Vestibulum posuere massa id lectus faucibus posuere. Donec non sollicitudin est. Donec libero turpis, malesuada in Test` r = /\\btest(?!\\W+(?:\\w+\\W+){0,14}test)/gi var m while(m = r.exec(s)) { console.log(m) } 

How it works: 怎么运行的:

  • \\b Word boundary \\b字边界
  • test match this literally (case-insensitive with i flag) test字面上是否匹配(不带i标志的大小写)
  • (?!\\W+(?:\\w+\\W+){0,14}test) negative lookahead ensuring the following does not match: (?!\\W+(?:\\w+\\W+){0,14}test)否定前瞻,确保以下条件不匹配:
    • \\W+ match any non-word character one or more times \\W+匹配任何非单词字符一次或多次
    • (?:\\w+\\W+){0,14} match between zero and fourteen words (?:\\w+\\W+){0,14}介于零和十四个字之间
    • test match this literally (case-insensitive again) test字面上的匹配(再次不区分大小写)

This is a working regex : (?<!test(?:\\w|\\s)*\\W{0,14})test 这是一个有效的正则表达式(?<!test(?:\\w|\\s)*\\W{0,14})test

Here is how it works in the parenthesis: 这是在括号中的工作方式:

  • ?<! is the negative lookbehind notation 是负向后看符号
  • test look for the test words test寻找test
  • (?:\\w|\\s)* that are followed by any number of non word or space characters (?:\\w|\\s)* ,后跟任意数量的非单词或空格字符
  • \\W{0,14} then followed by 0 to 14 words \\W{0,14}然后是0到14个字

So all together it gives: Find all the test words that are not following a test word followed by at most 15 words. 所以,大家聚在一起,得出:找到所有的test没有关注一个词test单词后面添加最多15个字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM