简体   繁体   English

Javascript正则表达式用于通过空格分割重音字符

[英]Javascript regex for splitting by whitespace for accented chars

I am trying to split string in javascript by whitespaces, but ignoring whitespaces enclosed in quotes. 我试图通过空格分割javascript中的字符串,但忽略引号括起来的空格。 So I googled this regular expression : (/\\w+|"[^"]+"/g) but the problem is, that this isn't working with accented chars like á etc. So please how should I improve my regular expression to make it work? 所以我用谷歌搜索了这个正则表达式:( (/\\w+|"[^"]+"/g)但问题是,这不适用于á等重音字符。所以请问我应该如何改进我的正则表达式让它起作用?

That's because \\w only matches [A-Za-z0-9_] . 那是因为\\w只匹配[A-Za-z0-9_] To match accented characters, add the unicode block range \\x81-\\xFF which includes the Latin-1 characters à and ã , et cetera : 要匹配重音字符,请添加unicode块范围\\x81-\\xFF ,其中包括Latin-1字符àã等等

(/[\w\x81-\xFF]+|"[^"]+"/g)

There's also this site, which is very helpful to build the required unicode block range. 还有这个站点,这对构建所需的unicode块范围非常有帮助。

这匹配不包含引号的非空格,并匹配引号之间的文本:

/[^\s"]+|"[^"]+"/g

如果要匹配所有非空白字符而不是仅匹配字母数字字符,请将\\w替换为\\S

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM