简体   繁体   English

用特殊字符将句子分隔成包含空格的单词

[英]Separate sentence with special characters into words including spaces

I want to separate a sentence with special characters into words keeping the spaces . 我想将带有特殊字符的句子分隔成保留空格的单词。 Like so: 像这样:

"la sílaba tónica es la penúltima".split(...regex...)

to: 至:

["la ", "sílaba ", "tónica ", "es ", "la ", "penúltima"]
    ↑                     ↑      ↑      ↑
  space                 space  space  space

I've tried with a modified version of this answer: https://stackoverflow.com/a/26184632/2083117 我尝试使用此答案的修改版本: https : //stackoverflow.com/a/26184632/2083117

With the code from that answer: 使用该答案中的代码:

"la sílaba tónica es la penúltima".split(/\b(?![\s.])/)

Result: 结果:

["la ", "s", "í", "laba ", "t", "ó", "nica ", "es ", "la ", "pen", "ú", "ltima"]
              ↑                  ↑                                  ↑

Those special characters shouldn't split the word. 那些特殊字符不应该分开。

My version simply adding the special characters I want to keep ( .áéíóúñ,:;? ): 我的版本仅添加了我想保留的特殊字符( .áéíóúñ,:;? ):

"la sílaba tónica es la penúltima".split(/\b(?![\s.áéíóúñ,:;?])/)

Result: 结果:

["la ", "sí", "laba ", "tó", "nica ", "es ", "la ", "penú", "ltima"]
          ↑              ↑                              ↑

Now the characters are included but the word is braking after them. 现在包括了字符,但单词紧跟其后。

What would be the right regular expression for this? 什么是正确的正则表达式呢?

Try to match \\S+\\s* instead of split. 尝试匹配\\S+\\s*而不是拆分。

 var result = "la sílaba tónica es la penúltima".match(/\\S+\\s*/gi); console.log(result); 

 let splitArray = "la sílaba tónica es la penúltima".split(" ") let splitArrayWithSpaces = splitArray.map((item, index ) => { if(index!== splitArray.length-1) return (item+ " ") else return item }) console.log(splitArrayWithSpaces) 

This az\\xC0-\\xff selects chars and diacritics. az\\xC0-\\xff选择字符和变音符号。 I split it by /[^az\\xC0-\\xff]/ . 我用/[^az\\xC0-\\xff]/ Then I add the space. 然后我添加空间。

Alternatively you can split by /[\\s]/ 另外,您也可以按/[\\s]/

 let test = "la sílaba tónica es la penúltima".split(/[^az\\xC0-\\xff]/) for(let i=0; i < test.length; i++){test[i]+= " ";} console.log(test) 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM