I'm trying to create a custom word boundary (like \b
) that also takes words starting or ending with the unicode characters "ÆØÅæøå" into consideration.
Now the only thing I can come up with is this ugly thing
((?<?[\wÆØÅæøå])(?=[\wÆØÅæøå])|(?![\wÆØÅæøå])(?<=[\wÆØÅæøå]))
Is there a more elegant solution to this? Or is this the only way.
You can use:
(?<!\p{L}\p{M}*|[\p{N}_]) // leading word boundary, similar to \<, [[:<:]] or \m in other flavors
(?![\p{L}\p{N}_]) // trailing word boundary, similar to \>, [[:>:]] or \M
Compile the regex with the u
modifier to enable Unicode category classes.
The (?<!\p{L}\p{M}*|[\p{N}_])
is a negative lookbehind that matches a location not immediately preceded with a letter followed with zero or more diacritic marks or a digit or an underscore.
The (?![\p{L}\p{N}_])
is a negative lookahead that matches a location not immediately followed with a letter, digit or an underscore.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.