简体   繁体   English

使用正则表达式将字符串拆分为单词数组

[英]Splitting string into array of words using Regular Expressions

I'm trying to split a string into an array of words, however I want to keep the spaces after each word. 我正在尝试将一个字符串拆分成一个单词数组,但是我想在每个单词后面保留空格。 Here's what I'm trying: 这是我正在尝试的:

var re = /[a-z]+[$\s+]/gi;
var test = "test   one two     three   four ";
var results = test.match(re);

The results I expect are: 我期望的结果是:

[0]: "test   "
[1]: "one "
[2]: "two     "
[3]: "three   "
[4]: "four "

However, it only matches up to one space after each word: 但是,每个单词后最多只匹配一个空格:

[0]: "test "
[1]: "one "
[2]: "two "
[3]: "three "
[4]: "four "

What am I doing wrong? 我究竟做错了什么?

Consider: 考虑:

var results = test.match(/\S+\s*/g);

That would guarantee you don't miss any characters (besides a few spaces at the beginnings, but \\S*\\s* can take care of that) 这样可以保证你不会错过任何字符(除了开头的几个空格,但\\S*\\s*可以处理)

Your original regex reads: 你原来的正则表达式是:

  • [az]+ - match any number of letters (at least one) [az]+ - 匹配任意数量的字母(至少一个)
  • [$\\s+] - much a single character - $ , + or whitespace. [$\\s+] - 多个字符 - $+或空格。 With no quantifier after this group, you only match a single space. 在此组之后没有量词,您只匹配一个空格。

请尝试以下方法:

test.match(/\w+\s+/g); // \w = words, \s = white spaces

You are using + inside the char class. 你在char类中使用+ Try using * outside the char class instead. 尝试在char类之外使用*

/[a-z]+\s*/gi;

+ inside the char class is treated as a literal + and not as a meta char. char类中的+被视为文字+而不是元字符。 Using * will capture zero or more spaces that might follow any word. 使用*将捕获零个或多个可能跟随任何单词的空格。

The + is taken literally inside the character class. +字面意思在字符类中。 You have to move it outside: [\\s]+ or just \\s+ ( $ has no meaning inside the class either). 你必须将它移到外面: [\\s]+或只是\\s+$在类中没有意义)。

The essential bit of your RegEx that needs changing is the part matching the whitespace or end-of-line. 您需要更改的RegEx的基本位是与空白或行尾匹配的部分。

Try: 尝试:

var re = /[a-z]+($|\s+)/gi

or, for non-capturing groups (I don't know if you need this with the /g flag): 或者,对于非捕获组 (我不知道你是否需要使用/g标志):

var re = /[a-z]+(?:$|\s+)/gi

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM