简体   繁体   中英

JavaScript split string with .match(regex)

From the Mozilla Developer Network for function split() :

The split() method returns the new array.

When found, separator is removed from the string and the substrings are returned in an array. If separator is not found or is omitted, the array contains one element consisting of the entire string. If separator is an empty string, str is converted to an array of characters.

If separator is a regular expression that contains capturing parentheses, then each time separator is matched, the results (including any undefined results) of the capturing parentheses are spliced into the output array. However, not all browsers support this capability.

Take the following example:

var string1 = 'one, two, three, four';
var splitString1 = string1.split(', ');
console.log(splitString1); // Outputs ["one", "two", "three", "four"]

This is a really clean approach. I tried the same with a regular expression and a somewhat different string:

var string2 = 'one split two split three split four';
var splitString2 = string2.split(/\ split\ /);
console.log(splitString2); // Outputs ["one", "two", "three", "four"]

This works just as well as the first example. In the following example, I have altered the string once more, with 3 different delimiters:

var string3 = 'one split two splat three splot four';
var splitString3 = string3.split(/\ split\ |\ splat\ |\ splot\ /);
console.log(splitString3); // Outputs ["one", "two", "three", "four"]

However, the regular expression gets relatively messy right now. I can group the different delimiters, however the result will then include these delimiters:

var string4 = 'one split two splat three splot four';
var splitString4 = string4.split(/\ (split|splat|splot)\ /);
console.log(splitString4); // Outputs ["one", "split", "two", "splat", "three", "splot", "four"]

So I tried removing the spaces from the regular expression while leaving the group, without much avail:

var string5 = 'one split two splat three splot four';
var splitString5 = string5.split(/(split|splat|splot)/);
console.log(splitString5);

Although, when I remove the parentheses in the regular expression, the delimiter is gone in the split string:

var string6 = 'one split two splat three splot four';
var splitString6 = string6.split(/split|splat|splot/);
console.log(splitString6); // Outputs ["one ", " two ", " three ", " four"]

An alternative would be to use match() to filter out the delimiters, except I don't really understand how reverse lookaheads work:

var string7 = 'one split two split three split four';
var splitString7 = string7.match(/((?!split).)*/g);
console.log(splitString7); // Outputs ["one ", "", "plit two ", "", "plit three ", "", "plit four", ""]

It doesn't match the whole word to begin with. And to be honest, I don't even know what's going on here exactly.


How do I properly split a string using regular expressions without having the delimiter in my result?

Use a non-capturing group as split regex. By using non-capturing group, split matches will not be included in resulting array.

 var string4 = 'one split two splat three splot four'; var splitString4 = string4.split(/\\s+(?:split|splat|splot)\\s+/); console.log(splitString4); 

// Output => ["one", "two", "three", "four"]

If you want to use match you can write it like

'one split two split three split four'.match(/(\b(?!split\b)[^ $]+\b)/g)
["one", "two", "three", "four"]

What it does?

  • \\b Matches a word boundary

  • (?!split\\b) Negative look ahead, check if the word is not split

  • [^ $]+ Matches anything other than space or $ , end of string. This pattern will match a word, the look ahead ensures that what it matches is not split .

  • \\b Matches the word end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM