简体   繁体   English

正则表达式匹配所有符号但除了一个单词

[英]Regex to match all of symbols but except a word

How do regex to match all of symbols but except a word? 正则表达式如何匹配除了单词之外的所有符号?

Need find all symbols except a word. 需要查找除单词之外的所有符号。

(.*) - It find all symbols. (.*) - 它找到所有符号。

[^v] - It find all symbols except letter v [^v] - 它找到除字母v之外的所有符号

But do how find all symbols except a word ? 但是如何找到除一个单词以外的所有符号

Solution (writed below): 解决方案(如下所示):

((?:(?!here any word for block)[\s\S])*?)

or 要么

((?:(?!here any word for block).)*?)

((?:(?!video)[\\s\\S])*?)


I want to find all except |end| 我想找到除|end|之外的所有内容 and replace all except `|end|. 并替换除`| end |之外的所有内容。

I try: 我尝试:

Need all except |end| 需要除|end|之外的所有内容

 var str = '|video| |end| |water| |sun| |cloud|'; // May be: //var str = '|end| |video| |water| |sun| |cloud|'; //var str = '|cloud| |video| |water| |sun| |end|'; str.replace(/\\|((?!end|end$).*?)\\|/gm, test_fun2); function test_fun2(match, p1, offset, str_full) { console.log("--------------"); p1 = "["+p1+"]"; console.log(p1); console.log("--------------"); return p1; } 

Output console log: 输出控制台日志:

--------------
[video]
--------------
--------------

--------------
--------------

--------------
--------------

--------------

Example what need: 示例需要什么:

Any symbols except [video]( [video](以外的任何符号[video](

input - '[video](text-1 *******any symbols except: "[video](" ******* [video](text-2 any symbols) [video](text-3 any symbols) [video](text-4 any symbols) [video](text-5 any symbols)' 输入 - '[video](text-1 *******any symbols except: "[video](" ******* [video](text-2 any symbols) [video](text-3 any symbols) [video](text-4 any symbols) [video](text-5 any symbols)'

output - <div>text-1 *******any symbols except: "[video](" *******</div> <div>text-2 any symbols</div><div>text-3 any symbols</div><div>text-4 any symbols</div><div>text-5 any symbols</div> 输出 - <div>text-1 *******any symbols except: "[video](" *******</div> <div>text-2 any symbols</div><div>text-3 any symbols</div><div>text-4 any symbols</div><div>text-5 any symbols</div>

Scenario 1 场景1

Use the best trick ever : 使用最好的技巧

One key to this technique, a key to which I'll return several times, is that we completely disregard the overall matches returned by the regex engine: that's the trash bin. 这个技术的一个关键,我将多次返回的关键是,我们完全忽略了正则表达式引擎返回的整体匹配:这就是垃圾桶。 Instead, we inspect the Group 1 matches, which, when set, contain what we are looking for. 相反,我们会检查第1组匹配,这些匹配在设置时包含我们正在寻找的内容。

Solution: 解:

s = s.replace(/\|end\||\|([^|]*)\|/g, function ($0, $1) { 
    return $1 ? "[" + $1 + "]" : $0; 
});

Details 细节

  • \\|end\\| - |end| - |end| is matched 匹配
  • | - or - 要么
  • \\|([^|]*)\\| - | - | is matched, any 0+ chars other than | 匹配,除了|之外的任何0+字符 are captured into Group 1, and then | 被捕获到第1组,然后是| is matched. 匹配。

If Group 1 matched ( $1 ? ) the replacement occurs, else, $0 , the whole match, is returned back to the result. 如果组1匹配( $1 ? ),则发生替换,否则, $0 ,整个匹配,返回到结果。

JS test: JS测试:

 console.log( "|video| |end| |water| |sun| |cloud|".replace(/\\|end\\||\\|([^|]*)\\|/g, function ($0, $1) { return $1 ? "[" + $1 + "]" : $0; }) ) 

Scenario 2 情景2

Use 采用

.replace(/\[(?!end])[^\]]*]\(((?:(?!\[video]\()[\s\S])*?)\)/g, '<div>$1</div>')

See the regex demo 请参阅正则表达式演示

Details 细节

  • \\[ - a [ char \\[ - 一个[ char
  • (?!end]) - no end] allowed right after the current position 在当前位置之后立即允许(?!end]) - no end]
  • [^\\]]* - 0+ chars other than ] and [ [^\\]]* - 0+字符以外][
  • ] - a ] char ] - a ] char
  • \\( - a ( char \\( - a ( char
  • ((?:(?!\\[video])[\\s\\S])*?) - Group 1 that captures any char ( [\\s\\S] ), 0 or more occurrences, but as few as possible ( *? ) that does not start a [video]( char sequence ((?:(?!\\[video])[\\s\\S])*?) - 第1组捕获任何字符( [\\s\\S] ),0或更多次出现,但尽可能少( *? )没有启动[video](字符序列
  • \\) - a ) char. \\) - a ) char。

Something like this is better done in multiple steps. 这样的事情在多个步骤中做得更好。 Also, if you're matching stuff, you should use match . 此外,如果你匹配的东西,你应该使用match

var str = '|video| |end| |water| |sun| |cloud|';
var matches = str.match(/\|.*?\|/g);

// strip pipe characters...
matches = matches.map(m=>m.slice(1,-1));

// filter out unwanted words
matches = matches.filter(m=>!['end'].includes(m));
           // this allows you to add more filter words easily
           // if you'll only ever need "end", just do (m=>m!='end')

console.log(matches); // ["video","water","sun","cloud"]

Notice how this is a lot easier to understand what's going on, and also much easier to maintain and change in future as needed. 请注意,如何更容易理解正在发生的事情,以及将来根据需要更容易维护和更改。

You are on the right track. 你走在正确的轨道上。 Here is what you need to do with regex: 以下是正则表达式所需要做的事情:

 var str = '|video| |end| |water| |sun| |cloud|'; console.log(str.replace(/(?!\\|end\\|)\\|(\\S*?)\\|/gm, test_fun2)); function test_fun2(match, p1, offset, str_full) { return "["+p1+"]"; } 

And an explanation of what was wrong - you had your negative-lookahead placed after the | 并解释了什么是错误的 - 你在| 之后放置了负面的前瞻 character. 字符。 That means that the matching engine would do the following: 这意味着匹配引擎将执行以下操作:

  1. Match |video| 比赛|video| because the pattern works with it 因为这个模式适用于它
  2. Grab the next | 抓住下一个|
  3. Find that the next text is end which is in the negative lookahead and drop it. 查找下一个文本是end这是在排除模式,并把它。
  4. Grab the | 抓住| immediately after end end 后立即
  5. grab the space and the next | 抓住空间和下一个| character, since this passes the negative lookahead and also works with .*? 角色,因为这传递了负面的前瞻,也适用于.*?
  6. continue grabbing the intermediate | | 继续抓住中间体| | | | sequences because the | 序列因为| in the beginning of the word was consumed by the previous match. 在这个词的开头被前一场比赛消耗了。

So you end up matching the following things 所以你最终会匹配以下内容

var str = '|video| |end| |water| |sun| |cloud|';
           ^^^^^^^     ^^^     ^^^   ^^^
|video| ______|         |       |     |
| | ____________________|       |     |
| | ____________________________|     |
| | __________________________________|

All because the |end match was dropped. 全部是因为|end匹配被删除了。

You can see this if you print out the matches 如果你打印出比赛,你可以看到这个

 var str = '|video| |end| |water| |sun| |cloud|'; str.replace(/\\|((?!end|end$).*?)\\|/gm, test_fun2); function test_fun2(match, p1, offset, str_full) { console.log(match, p1, offset); } 

You will see that the second, third, and fourth match is | | 你会看到第二,第三和第四match| | | | the captured item p1 is 捕获的项目p1 - a blank space (not very well displayed, but there) and the offset they were found were 12 , 20 , 26 -一个空格(不能很好地显示出来,但是有)和偏移他们被发现分别为122026

|video| |end| |water| |sun| |cloud|
01234567890123456789012345678901234
            ^       ^     ^
12 _________|       |     |
20 _________________|     |
26 _______________________|

The change I made was to instead look for explicitly the |end| 我所做的改变是明确地寻找|end| pattern in a negative lookahead and also to only match non-whitespace characters, so you don't grab | | 负向前瞻中的模式,也只匹配非空白字符,所以你不要抓住| | | | again. 再次。

Also worth noting that you can move your filtering logic to the replacement callback instead, instead of the regex. 另外值得注意的是,您可以将过滤逻辑移动到替换回调,而不是正则表达式。 This simplifies the regex but makes your replacement more complex. 这简化了正则表达式,但使您的替换变得更加复杂。 Still, it's a fair tradeoff, as code is usually easier to maintain if you have more complex conditions: 不过,这是一个公平的权衡,因为如果你有更复杂的条件,代码通常更容易维护:

 var str = '|video| |end| |water| |sun| |cloud|'; //capturing word characters - an alternative to "non-whitespace" console.log(str.replace(/\\|(\\w*)\\|/gm, test_fun2)); function test_fun2(match, p1, offset, str_full) { if (p1 === 'end') { return match; } else { return "[" + p1 + "]" } } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM