简体   繁体   English

正则表达式基于相同的单词创建多个单词片段

[英]Regular expression to create multiple word fragments based off the same words

Let's say I have the following string: 假设我有以下字符串:

var str = "I like barbeque at dawn";

I want pairs of all words which are separated by a space. 我想要成对的所有单词,用空格隔开。 This can be achieved via the following regular expression: 这可以通过以下正则表达式来实现:

  var regex = /[a-zA-Z]+ [a-zA-Z]+/g;
  str.match(regex);

This results in: 结果是:

["I like", "barbeque at"]

But what if I want ALL permutations of the pairs? 但是,如果我要对的所有排列怎么办? The regular expression fails, because it only matches any given word onces. 正则表达式失败,因为它一次只匹配任何给定的单词。 For example, this is what I want: 例如,这就是我想要的:

["I like", "like barbeque", "barbeque at", "at dawn"]

I know I can use the recursive backtracking pattern to generate permutations. 我知道我可以使用递归回溯模式来生成排列。 Do regular expressions have the power to create these types of pairs for me? 正则表达式是否有能力为我创建这些类型的对?

Use a lookahead with a capture, which allows overlapping matches: 将前瞻性与捕获一起使用,以允许重叠匹配:

(\w+)\s+(?=(\w+))

Demo 演示

Alternative if you want to capture in one group vs two: 如果您想捕获一组而不是两个,则可以选择:

(?=(\b\w+\s+\b\w+))

Demo 演示

This regex will do it: 这个正则表达式可以做到:

(?=\b([a-zA-Z]+ [a-zA-Z]+))

See demo 观看演示

Explanation: 说明:

  • We use a look-ahead (?=...) in order not to test each position inside the input string. 为了不测试输入字符串中的每个位置,我们使用前瞻(?=...) Thus, we'll still "move" through the whole string not consuming any characters 因此,我们仍将“遍历”整个字符串而不消耗任何字符
  • \\b will force the regex engine to find the borderline between the matches returned by the subsequent subpattern \\b将强制正则表达式引擎查找后续子模式返回的匹配项之间的边界线
  • ([a-zA-Z]+ [a-zA-Z]+) is the capturing group, that will collect 2-word phrases. ([a-zA-Z]+ [a-zA-Z]+)是捕获组,它将收集2个单词的短语。

Sample code: 样例代码:

var re = /(?=\b([a-zA-Z]+ [a-zA-Z]+))/g;
var str = 'i like barbeque at dawn';

while ((m = re.exec(str)) !== null) {
    document.getElementById("res").innerHTML += m[1] + "<br/>";
}

You can do the following: 您可以执行以下操作:

(\w+)\s+(?=(\w+))

and capture the pairs with ($1, $2) 并用($1, $2)捕获对

See DEMO 演示

Input: i like barbeque at dawn

Output: (i, like) (like, barbeque) (barbeque, at) (at, dawn)

You can use lookaheads for this: 您可以为此先行使用:

var str = "i like barbeque at dawn";
var regex = /(?=\b([a-zA-Z]+ [a-zA-Z]+)\b)/g;
var matches= [];

while ((match = regex.exec(str)) != null) {
    if (match.index === regex.lastIndex)
       regex.lastIndex++;
    matches.push(match[1]);
}

console.log(matches);
//=> ["i like", "like barbeque", "barbeque at", "at dawn"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM