自定义正则表达式单词边界（javascript）

Question

I'm trying to create a custom word boundary (like \b ) that also takes words starting or ending with the unicode characters "ÆØÅæøå" into consideration.我正在尝试创建一个自定义单词边界（如\b ），它还考虑以 unicode 字符“ÆØÅæøå”开头或结尾的单词。

Now the only thing I can come up with is this ugly thing现在我唯一能想到的就是这个丑陋的东西

((?<?[\wÆØÅæøå])(?=[\wÆØÅæøå])|(?![\wÆØÅæøå])(?<=[\wÆØÅæøå]))

Is there a more elegant solution to this?有没有更优雅的解决方案？ Or is this the only way.或者这是唯一的方法。

Answer 1

You can use:您可以使用：

(?<!\p{L}\p{M}*|[\p{N}_]) // leading word boundary, similar to \<, [[:<:]] or \m in other flavors
(?![\p{L}\p{N}_])         // trailing word boundary, similar to \>, [[:>:]] or \M

Compile the regex with the u modifier to enable Unicode category classes.使用u修饰符编译正则表达式以启用 Unicode 类别类。

The (?<!\p{L}\p{M}*|[\p{N}_]) is a negative lookbehind that matches a location not immediately preceded with a letter followed with zero or more diacritic marks or a digit or an underscore. (?<!\p{L}\p{M}*|[\p{N}_])是一个否定的lookbehind，它匹配一个没有紧跟在字母前面的位置，后面跟着零个或多个变音符号或数字或下划线。

The (?![\p{L}\p{N}_]) is a negative lookahead that matches a location not immediately followed with a letter, digit or an underscore. (?![\p{L}\p{N}_])是一个否定的前瞻，它匹配一个没有紧跟字母、数字或下划线的位置。

自定义正则表达式单词边界（javascript）

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-12 23:07:50

自定义正则表达式单词边界（javascript）

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-12 23:07:50

解决方案1
0 已采纳 2021-03-12 23:07:50