用于匹配重复子串的单个js正则表达式？

Question

Say I have a string, like: 说我有一个字符串，如：

where is mummy where is daddy

I want to replace any set of repeating substrings with empty strings - so in this case the where and is elements would be removed and the resulting string would be: 我想用空字符串替换任何一组重复的子字符串 - 所以在这种情况下， where和is元素将被删除，结果字符串将是：

mummy daddy

I was wondering if there was any single regex that could achieve this. 我想知道是否有任何单一的正则表达式可以实现这一点。 The regex I tried (which doesn't work) looked like the following: 我试过的正则表达式（不起作用）如下所示：

/(\w+)(?=.*)\1/gi

Where the first capture group is any set of word characters, the second is a positive look ahead to any set of characters (in order to prevent those characters from being included in the result) and then the \\1 is a backreference to the first matched substring. 第一个捕获组是任何一组字符，第二个是对任何字符集的正面预测（为了防止这些字符被包含在结果中），然后\\1是对第一个匹配的反向引用子。

Any help would be great. 任何帮助都会很棒。 Thanks in advance! 提前致谢！

Answer 1

Your regex does not work because the \\w+ is not restricted with word boundaries and the \\1 backreference is tried to match right after the "original" word, which is almost never true. 你的正则表达式不起作用，因为\\w+不受字边界限制，并且\\1反向引用试图在“原始”单词之后匹配，这几乎不是真的。

You need to first get the words that are dupes, and then build a RegExp to match them all with optional whitespace (or punctuation, etc. - adjust the pattern later) and replace with an empty string: 你需要首先得到dupes的单词，然后构建一个RegExp，将它们全部与可选的空格（或标点符号等）相匹配 - 稍后调整模式并用空字符串替换：

 var re = /(\\b\\w+\\b)(?=.*\\b\\1\\b)/gi; // Get the repeated whole words var str = 'where is mummy where is daddy'; var patts = str.match(re); // Collect the matched repeated words var res = str.replace(RegExp("\\\\s*\\\\b(?:" + patts.join("|") +")\\\\b", "gi"), ""); // Build the pattern for replacing all found words document.body.innerHTML = res;

The first pattern is (\\b\\w+\\b)(?=.*\\b\\1\\b) : 第一种模式是(\\b\\w+\\b)(?=.*\\b\\1\\b) ：

(\\b\\w+\\b) - match and capture into Group 1 a whole word consisting of [A-Za-z0-9_] characters (\\b\\w+\\b) - 匹配并捕获由[A-Za-z0-9_]字符组成的整个单词组1
(?=.*\\b\\1\\b) - make sure this value captured into Group 1 is repeated somewhere to the right of the current location (not necessarily right after the word). (?=.*\\b\\1\\b) - 确保捕获到组1中的值重复到当前位置右侧的某个位置（不一定在单词后面）。 If the string is multiline, use [\\s\\S] instead of the dot. 如果字符串是多行，请使用[\\s\\S]而不是点。 To make sure we match original and dupe words as whole words, \\b word boundaries should be used around both \\w+ and \\1 . 为了确保我们将原始单词和欺骗单词匹配为整个单词， \\b应在\\w+和\\1周围使用单词边界。

The second pattern will look different each time, but in your current scenario, it will be /\\s*\\b(?:where|is)\\b/gi : 第二种模式每次都会有所不同，但在当前情况下，它将是/\\s*\\b(?:where|is)\\b/gi ：

\\s* - zero or more whitepsace \\s* - 零或多个whitepsace
\\b(?:where|is)\\b - a whole word from the alternation group (?:...|...) : either where or is (case-insensitive due to /i modifier). \\b(?:where|is)\\b - 来自交替组的整个单词(?:...|...) ： where或is （由于/i修饰符而不区分大小写）。

用于匹配重复子串的单个js正则表达式？

问题描述

1 个解决方案

解决方案1
7 已采纳 2016-03-21 09:43:32

用于匹配重复子串的单个js正则表达式？

问题描述

1 个解决方案

解决方案1 7 已采纳 2016-03-21 09:43:32

解决方案1
7 已采纳 2016-03-21 09:43:32