简体   繁体   English

RegEx 使用 RegExp.exec 从字符串中提取所有匹配项

[英]RegEx to extract all matches from string using RegExp.exec

I'm trying to parse the following kind of string:我正在尝试解析以下类型的字符串:

[key:"val" key2:"val2"]

where there are arbitrary key:"val" pairs inside.里面有任意 key:"val" 对。 I want to grab the key name and the value.我想获取键名和值。 For those curious I'm trying to parse the database format of task warrior.对于那些好奇的人,我正在尝试解析任务战士的数据库格式。

Here is my test string:这是我的测试字符串:

[description:"aoeu" uuid:"123sth"]

which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.这是为了强调除了空格之外的任何内容都可以在键或值中,冒号周围没有空格,并且值总是用双引号引起来。

In node, this is my output:在节点中,这是我的 output:

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
  'uuid',
  '123sth',
  index: 0,
  input: '[description:"aoeu" uuid:"123sth"]' ]

But description:"aoeu" also matches this pattern.但是description:"aoeu"也匹配这个模式。 How can I get all matches back?我怎样才能找回所有的比赛?

Continue calling re.exec(s) in a loop to obtain all the matches:继续在循环中调用re.exec(s)以获取所有匹配项:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

Try it with this JSFiddle: https://jsfiddle.net/7yS2V/试试这个 JSFiddle: https://jsfiddle.net/7yS2V/

str.match(pattern) , if pattern has the global flag g , will return all the matches as an array. str.match(pattern) ,如果pattern具有全局标志g ,则将所有匹配项作为数组返回。

For example:例如:

 const str = 'All of us except @Emran, @Raju and @Noman were there'; console.log( str.match(/@\w*/g) ); // Will log ["@Emran", "@Raju", "@Noman"]

To loop through all matches, you can use the replace function:要遍历所有匹配项,您可以使用replace function:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });

This is a solution这是一个解决方案

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
  console.log(m[1], m[2]);
}

This is based on lawnsea's answer, but shorter.这是基于草坪的答案,但更短。

Notice that the `g' flag must be set to move the internal pointer forward across invocations.请注意,必须设置“g”标志以跨调用向前移动内部指针。

str.match(/regex/g)

returns all matches as an array.将所有匹配项作为数组返回。

If, for some mysterious reason, you need the additional information comes with exec , as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler:).如果出于某种神秘的原因,您需要exec附带的附加信息,作为以前答案的替代方案,您可以使用递归 function 而不是如下循环(看起来也更酷:)。

function findMatches(regex, str, matches = []) {
   const res = regex.exec(str)
   res && matches.push(res) && findMatches(regex, str, matches)
   return matches
}

// Usage
const matches = findMatches(/regex/g, str)

as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.如前所述,在正则表达式定义的末尾添加g以在每次执行中向前移动指针非常重要。

We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table .我们终于开始看到内置的matchAll function,请参阅此处了解说明和兼容性表 It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera.截至 2020 年 5 月,Chrome、Edge、Firefox 和 Node.js (12+) 似乎受支持,但不支持 IE、Safari 和 Opera。 Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.似乎它是在 2018 年 12 月起草的,所以给它一些时间来覆盖所有浏览器,但我相信它会到达那里。

The built-in matchAll function is nice because it returns an iterable .内置的matchAll function 很好,因为它返回一个iterable It also returns capturing groups for every match!它还为每场比赛返回捕获组! So you can do things like所以你可以做类似的事情

// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
    console.log("letter before:" + match[1]);
    console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

It also seem like every match object uses the same format as match() .似乎每场比赛 object 都使用与match()相同的格式。 So each object is an array of the match and capturing groups, along with three additional properties index , input , and groups .因此,每个 object 都是匹配和捕获组的数组,以及三个附加属性indexinputgroups So it looks like:所以它看起来像:

[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

For more information about matchAll there is also a Google developers page .有关matchAll的更多信息,还有一个Google 开发者页面 There are also polyfills/shims available.还有可用的polyfills/shims

If you have ES9如果你有 ES9

(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later) (意味着如果您的系统:Chrome、Node.js、Firefox 等支持 Ecmascript 2019 或更高版本)

Use the new yourString.matchAll( /your-regex/ ) .使用新的yourString.matchAll( /your-regex/ )

If you don't have ES9如果你没有 ES9

If you have an older system, here's a function for easy copy and pasting如果您的系统较旧,这里有一个 function,便于复制和粘贴

function findAll(regexPattern, sourceString) {
    let output = []
    let match
    // make sure the pattern has the global flag
    let regexPatternWithGlobal = RegExp(regexPattern,[...new Set("g"+regexPattern.flags)].join(""))
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // get rid of the string copy
        delete match.input
        // store the match data
        output.push(match)
    } 
    return output
}

example usage:示例用法:

console.log(   findAll(/blah/g,'blah1 blah2')   ) 

outputs:输出:

[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]

Based on Agus's function, but I prefer return just the match values:基于 Agus 的 function,但我更喜欢只返回匹配值:

var bob = "&gt; bob &lt;";
function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
        while (m = regex.exec(str)) {
            res.push(m[1]);
        }
    } else {
        if (m = regex.exec(str)) {
            res.push(m[1]);
        }
    }
    return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch);  // yeilds: [&gt;, &lt;]

Iterables are nicer:可迭代更好:

const matches = (text, pattern) => ({
  [Symbol.iterator]: function * () {
    const clone = new RegExp(pattern.source, pattern.flags);
    let match = null;
    do {
      match = clone.exec(text);
      if (match) {
        yield match;
      }
    } while (match);
  }
});

Usage in a loop:循环使用:

for (const match of matches('abcdefabcdef', /ab/g)) {
  console.log(match);
}

Or if you want an array:或者如果你想要一个数组:

[ ...matches('abcdefabcdef', /ab/g) ]

Here is my function to get the matches:这是我的 function 来获取匹配项:

function getAllMatches(regex, text) {
    if (regex.constructor !== RegExp) {
        throw new Error('not RegExp');
    }

    var res = [];
    var match = null;

    if (regex.global) {
        while (match = regex.exec(text)) {
            res.push(match);
        }
    }
    else {
        if (match = regex.exec(text)) {
            res.push(match);
        }
    }

    return res;
}

// Example:

var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');

res.forEach(function (item) {
    console.log(item[0]);
});

Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:从 ES9 开始,现在有一种更简单、更好的方法来获取所有匹配项,以及有关捕获组及其索引的信息:

const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
    console.log(match);
}

// ["mice", index: 0, input: "mice like to dice rice", groups: undefined] // ["mice", index: 0, input: "mice like to dice to dice", groups: undefined]

// ["dice", index: 13, input: "mice like to dice rice", groups: undefined] // ["dice", index: 13, input: "mice like to dice rice", groups: undefined]

// ["rice", index: 18, input: "mice like to dice rice", groups: undefined] // ["rice", index: 18, input: "mice like to dice rice", groups: undefined]

It is currently supported in Chrome, Firefox, Opera.目前在 Chrome、Firefox、Opera 中支持。 Depending on when you read this, check this link to see its current support.根据您阅读本文的时间,检查此链接以查看其当前支持。

Use this...用这个...

var all_matches = your_string.match(re);
console.log(all_matches)

It will return an array of all matches...That would work just fine.... But remember it won't take groups in account..It will just return the full matches...它将返回一个包含所有匹配项的数组......这会很好......但请记住,它不会考虑组......它只会返回完整的匹配项......

I would definatly recommend using the String.match() function, and creating a relevant RegEx for it.我肯定会推荐使用 String.match() function,并为它创建一个相关的 RegEx。 My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.我的示例是一个字符串列表,这在扫描用户输入的关键字和短语时通常是必需的。

    // 1) Define keywords
    var keywords = ['apple', 'orange', 'banana'];

    // 2) Create regex, pass "i" for case-insensitive and "g" for global search
    regex = new RegExp("(" + keywords.join('|') + ")", "ig");
    => /(apple|orange|banana)/gi

    // 3) Match it against any string to get all matches 
    "Test string for ORANGE's or apples were mentioned".match(regex);
    => ["ORANGE", "apple"]

Hope this helps!希望这可以帮助!

This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.这并不能真正帮助您解决更复杂的问题,但无论如何我都会发布这个,因为对于不像您这样进行全局搜索的人来说,它是一个简单的解决方案。

I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).我已将答案中的正则表达式简化为更清晰(这不是您确切问题的解决方案)。

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

// We only want the group matches in the array
function purify_regex(reResult){

  // Removes the Regex specific values and clones the array to prevent mutation
  let purifiedArray = [...reResult];

  // Removes the full match value at position 0
  purifiedArray.shift();

  // Returns a pure array without mutating the original regex result
  return purifiedArray;
}

// purifiedResult= ["description", "aoeu"]

That looks more verbose than it is because of the comments, this is what it looks like without comments这看起来比评论更冗长,这就是没有评论的样子

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

function purify_regex(reResult){
  let purifiedArray = [...reResult];
  purifiedArray.shift();
  return purifiedArray;
}

Note that any groups that do not match will be listed in the array as undefined values.请注意,任何不匹配的组都将作为undefined值列在数组中。

This solution uses the ES6 spread operator to purify the array of regex specific values.此解决方案使用 ES6 扩展运算符来净化正则表达式特定值的数组。 You will need to run your code through Babel if you want IE11 support.如果你想要 IE11 支持,你需要通过Babel运行你的代码。

Here's a one line solution without a while loop .这是一个没有 while 循环的单行解决方案

The order is preserved in the resulting list.该顺序保留在结果列表中。

The potential downsides are潜在的缺点是

  1. It clones the regex for every match.它为每场比赛克隆正则表达式。
  2. The result is in a different form than expected solutions.结果的形式与预期的解决方案不同。 You'll need to process them one more time.您需要再处理一次。
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'

(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))

[ [ 'description:"aoeu"',
    'description',
    'aoeu',
    index: 0,
    input: 'description:"aoeu"',
    groups: undefined ],
  [ ' uuid:"123sth"',
    'uuid',
    '123sth',
    index: 0,
    input: ' uuid:"123sth"',
    groups: undefined ] ]

My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:我的猜测是,如果存在诸如多余或缺少空格之类的边缘情况,则此边界较少的表达式也可能是一种选择:

^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com .如果您想探索/简化/修改表达式,它已在regex101.com的右上角面板上进行了解释。 If you'd like, you can also watch in this link , how it would match against some sample inputs.如果您愿意,您还可以在此链接中观看它如何与一些示例输入匹配。


Test测试

 const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm; const str = `[description:"aoeu" uuid:"123sth"] [description: "aoeu" uuid: "123sth"] [ description: "aoeu" uuid: "123sth" ] [ description: "aoeu" uuid: "123sth" ] [ description: "aoeu"uuid: "123sth" ] `; let m; while ((m = regex.exec(str)).== null) { // This is necessary to avoid infinite loops with zero-width matches if (m.index === regex.lastIndex) { regex;lastIndex++. } // The result can be accessed through the `m`-variable. m,forEach((match. groupIndex) => { console,log(`Found match: group ${groupIndex}; ${match}`); }); }

RegEx Circuit正则表达式电路

jex.im visualizes regular expressions: jex.im可视化正则表达式:

在此处输入图像描述

If you want a more functional approach and to avoid looping you can just call a function until the result is null and on each capture you slice the string to the position of the captures group.如果您想要一种更实用的方法并避免循环,您可以调用 function 直到结果为null并在每次捕获时将字符串切片到捕获组的 position。

 // The MatchAll Function function matchAll (regexp, input, matches = []) { const regex = regexp.exec(input) if (regex === null) return matches // Filter out any undefined results const matched = regex.filter(i => i) // Destruct some common used values const { index } = regex const [ full, g1, g2, g3] = matched // Slice the input string to last match const string = input.slice(index + full.length) // Do something with the captured groups // Push this into an array matches.push({ prop: 'H' + g1 + g3 + g3 + 'ary ' + g2 }) // Return return matchAll(regexp, string) } // Record of matches const matches = [] // The RegExp, we are looking for some random letters / words in string const regExp = new RegExp(/(i{1}).*(did).*(l{1})/) // An example string to parse const testString = `Jeffrey Epstein didn't kill himself,` // Run matchAll(regExp, testString. matches) // Returned Result console.log(matches)

If you're able to use matchAll here's a trick:如果你能够使用matchAll这里有一个技巧:

Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need: Array.From有一个“选择器”参数,因此您可以将其投影到您真正需要的位置,而不是以一系列尴尬的“匹配”结果结束:

Array.from(str.matchAll(regexp), m => m[0]);

If you have named groups eg.如果您已命名组,例如。 ( /(?<firstname>[az][AZ]+)/g ) you could do this: /(?<firstname>[az][AZ]+)/g )你可以这样做:

Array.from(str.matchAll(regexp), m => m.groups.firstName);

Here is my answer:这是我的答案:

var str = '[me nombre es] : My name is. [Yo puedo] is the right word'; 

var reg = /\[(.*?)\]/g;

var a = str.match(reg);

a = a.toString().replace(/[\[\]]/g, "").split(','));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM