简体   繁体   中英

Getting an array of matches and plain strings from a JavaScript regular expression

I often want to parse a string with a regular expression and find all the matches plus all the non-matching strings, and all interspersed in their original order, eg

var parsed = regexParse(/{([^}]+)}/g, 'Hello {name}, you are {age} years old');

And so parsed will contain:

0 : "Hello "
1 : match containing {name}, name
2 : ", you are "
3 : match containing {age}, age
4 : " years old"

Is there anything in JavaScript (or some widely used library) that resembles this regexParse function? I wrote my own version of it, but it seems so obvious that I'm suspicious that there must already be a "standard" way of doing it:

var regexParse = function(rx, str) {
  var nextPlain = 0, result = [], match;
  rx.lastIndex = 0;
  for (;;) {
    match = rx.exec(str);
    if (!match) {
      result.push(str.substr(nextPlain));
      break;
    }
    result.push(str.substr(nextPlain, match.index - nextPlain));
    nextPlain = rx.lastIndex;
    result.push(match);
  }
  return result;
};

Update

Regarding Dennis's answer, at first I thought it was going to fail to help because all the values in the returned array are strings. How can you tell which items are unmatched text and which are from the matches?

But a bit of experimentation (with IE9 and Chrome anyway) suggests that when split is used in this way, it always alternates the pieces, so that the first is from plain text, the second is a match, the third is plain text, and so on. It follows this rule even if there are two matches with no unmatched text interspersed - it outputs an empty string in such cases.

Even in the trivial case:

'{x}'.split(/{([^}]+)}/g)

The output is strictly:

["", "x", ""]

So you can tell which is which if you know how (and if this assumption holds)!

I like to use the ES5 array methods map , forEach and filter . So with my original regexParse it was a matter of using typeof i == 'string to detect which items were unmatched text.

With split it has to be determined from the position in the returned array, but that's okay because the ES5 array methods pass a second argument, the index, and so we just need to find out if it's odd (a match) or even (unmatched text). So for example, if we have:

var ar = '{greeting} {name}, you are {age} years old'.split(/{([^}]+)}/g);

Now ar contains:

["", "greeting", " ", "name", ", you are ", "age", " years old"]

From that we can get just the matches:

ar.filter(function(s, i) { return i % 2 != 0; });

>>> ["greeting", "name", "age"]

Or just the plain text, stripping out empty strings also:

ar.filter(function(s, i) { return (i % 2 == 0) && s; });

>>> [" ", ", you are ", " years old"]

I think you're looking for split() with capturing parenthesis:

var myString = "Hello 1 word. Sentence number 2.";
var splits = myString.split(/(\d)/); // Hello ,1, word. Sentence number ,2, .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM