简体   繁体   English

从JavaScript正则表达式获取匹配和纯字符串数组

[英]Getting an array of matches and plain strings from a JavaScript regular expression

I often want to parse a string with a regular expression and find all the matches plus all the non-matching strings, and all interspersed in their original order, eg 我经常想用正则表达式解析一个字符串,找到所有的匹配以及所有不匹配的字符串,并且所有的字符串都按照原始顺序散布,例如

var parsed = regexParse(/{([^}]+)}/g, 'Hello {name}, you are {age} years old');

And so parsed will contain: 所以parsed将包含:

0 : "Hello "
1 : match containing {name}, name
2 : ", you are "
3 : match containing {age}, age
4 : " years old"

Is there anything in JavaScript (or some widely used library) that resembles this regexParse function? JavaScript(或一些广泛使用的库)中有什么类似于这个regexParse函数的东西吗? I wrote my own version of it, but it seems so obvious that I'm suspicious that there must already be a "standard" way of doing it: 我写了我自己的版本,但似乎很明显,我怀疑必须已经采用“标准”的方式:

var regexParse = function(rx, str) {
  var nextPlain = 0, result = [], match;
  rx.lastIndex = 0;
  for (;;) {
    match = rx.exec(str);
    if (!match) {
      result.push(str.substr(nextPlain));
      break;
    }
    result.push(str.substr(nextPlain, match.index - nextPlain));
    nextPlain = rx.lastIndex;
    result.push(match);
  }
  return result;
};

Update 更新

Regarding Dennis's answer, at first I thought it was going to fail to help because all the values in the returned array are strings. 关于Dennis的回答,起初我认为它无法提供帮助,因为返回数组中的所有值都是字符串。 How can you tell which items are unmatched text and which are from the matches? 如何判断哪些项目是不匹配的文本以及哪些项目来自匹配?

But a bit of experimentation (with IE9 and Chrome anyway) suggests that when split is used in this way, it always alternates the pieces, so that the first is from plain text, the second is a match, the third is plain text, and so on. 但是一些实验(无论如何使用IE9和Chrome)都表明,当以这种方式使用split时,它总是交替使用碎片,因此第一个是纯文本,第二个是匹配,第三个是纯文本,等等。 It follows this rule even if there are two matches with no unmatched text interspersed - it outputs an empty string in such cases. 它遵循这个规则,即使有两个匹配没有不匹配的文本散布 - 在这种情况下它输出一个空字符串。

Even in the trivial case: 即使在琐碎的情况下:

'{x}'.split(/{([^}]+)}/g)

The output is strictly: 输出严格:

["", "x", ""]

So you can tell which is which if you know how (and if this assumption holds)! 所以如果你知道如何(以及这个假设是否成立)你可以分辨哪个是哪个!

I like to use the ES5 array methods map , forEach and filter . 我喜欢使用ES5数组方法 mapforEachfilter So with my original regexParse it was a matter of using typeof i == 'string to detect which items were unmatched text. 因此,使用我原来的regexParse可以使用typeof i == 'string来检测哪些项目是不匹配的文本。

With split it has to be determined from the position in the returned array, but that's okay because the ES5 array methods pass a second argument, the index, and so we just need to find out if it's odd (a match) or even (unmatched text). split时必须根据返回数组中的位置确定,但这没关系,因为ES5数组方法传递第二个参数,索引,所以我们只需要找出它是奇数(匹配)还是偶数(不匹配)文本)。 So for example, if we have: 例如,如果我们有:

var ar = '{greeting} {name}, you are {age} years old'.split(/{([^}]+)}/g);

Now ar contains: 现在ar包含:

["", "greeting", " ", "name", ", you are ", "age", " years old"]

From that we can get just the matches: 从那以后我们可以得到匹配:

ar.filter(function(s, i) { return i % 2 != 0; });

>>> ["greeting", "name", "age"]

Or just the plain text, stripping out empty strings also: 或者只是纯文本,也删除空字符串:

ar.filter(function(s, i) { return (i % 2 == 0) && s; });

>>> [" ", ", you are ", " years old"]

I think you're looking for split() with capturing parenthesis: 我认为你正在寻找带有捕获括号的split()

var myString = "Hello 1 word. Sentence number 2.";
var splits = myString.split(/(\d)/); // Hello ,1, word. Sentence number ,2, .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM