简体   繁体   English

JavaScript正则表达式字符串匹配/替换

[英]JavaScript Regular Expression String Match/Replace

Given the string; 给定字符串; "{abc}Lorem ipsum{/abc} {a}dolor{/a}" “ {abc} Lorem ipsum {/ abc} {a}美元{/ a}”

I want to be able find occurrences of curly brace "tags", store the tag and the index where it was found and remove it from the original string. 我希望能够找到大括号“标签”的出现,将标签和索引存储在找到的位置,并将其从原始字符串中删除。 I want to repeat this process for each occurrence, but because I'm removing part of the string each time the index must be correct...I can't find all the indices THEN remove them at the end. 我想为每次出现重复此过程,但是因为每次索引必须正确时我都会删除部分字符串...我找不到所有索引,然后在末尾将其删除。 For the example above, what should happen is; 对于上面的示例,应该发生的事情是:

  • Search the string... 搜索字符串...
  • Find "{abc}" at index 0 在索引0处找到“ {abc}”
  • Push { tag: "{abc}", index: 0 } into an array 将{标签:“ {abc}”,索引:0}推入数组
  • Delete "{abc}" from string 从字符串中删除“ {abc}”
  • Repeat step 1 until no more matches can be found 重复步骤1,直到找不到更多匹配项

Given this logic, "{/abc}" should be found at index 11 - since "{abc}" has already been removed. 在这种逻辑下,应该在索引11处找到“ {/ abc}”,因为“ {abc}”已被删除。

I basically need to know where those "tags" start and end without actually having them as part of the string. 我基本上需要知道这些“标签”在哪里开始和结束,而实际上并没有将它们作为字符串的一部分。

I'm almost there using regular expressions but it sometimes skips occurrences. 我几乎在使用正则表达式,但有时会跳过出现的情况。

 let BETWEEN_CURLYS = /{.*?}/g; let text = '{abc}Lorem ipsum{/abc} {a}dolor{/a}'; let match = BETWEEN_CURLYS.exec(text); let tags = []; while (match !== null) { tags.push(match); text = text.replace(match[0], ''); match = BETWEEN_CURLYS.exec(text); } console.log(text); // should be; Lorem ipsum dolor console.log(tags); /** * almost there...but misses '{a}' * [ '{abc}', index: 0, input: '{abc}Lorem ipsum{/abc} {a}dolor{/a}' ] * [ '{/abc}', index: 11, input: 'Lorem ipsum{/abc} {a}dolor{/a}' ] * [ '{/a}', index: 20, input: 'Lorem ipsum {a}dolor{/a}' ] */ 

You need to subtract the match length from the regex lastIndex value, otherwise the next iteration starts farther than expected (since the input becomes shorter, and the lastIndex does not get changed after you call replace to remove the {...} substring): 您需要从正则表达式lastIndex值中减去匹配长度,否则下一次迭代的启动将比预期的要长(因为输入变得更短,并且在调用replace删除{...}子字符串后, lastIndex不会更改):

 let BETWEEN_CURLYS = /{.*?}/g; let text = '{abc}Lorem ipsum{/abc} {a}dolor{/a}'; let match = BETWEEN_CURLYS.exec(text); let tags = []; while (match !== null) { tags.push(match); text = text.replace(match[0], ''); BETWEEN_CURLYS.lastIndex = BETWEEN_CURLYS.lastIndex - match[0].length; // HERE match = BETWEEN_CURLYS.exec(text); } console.log(text); // should be; Lorem ipsum dolor console.log(tags); 

Some more RegExp#exec reference to bear in mind: 请记住更多RegExp#exec参考:

If your regular expression uses the " g " flag, you can use the exec() method multiple times to find successive matches in the same string. 如果您的正则表达式使用“ g ”标志,则可以多次使用exec()方法在同一字符串中查找连续的匹配项。 When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property ( test() will also advance the lastIndex property). 这样做时,搜索将从正则表达式的lastIndex属性指定的str的子字符串开始( test()还将使lastIndex属性前进)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM