I want to use regex as follows:
[a-z' ]*[a-z]
This won't work with different languages such as Chinese. Is it possible to create an inverse version of this regex to do the following:
Capture a word or words that are connected by a space
"Hey, july 2010"
=> hey
=> july
"hey what's up"
=> hey what's up
"汉漢字, 汉漢字 3004303"
=> 汉漢字
=> 汉漢字
First define your set of word characters: [\\pL'-]
( \\pL
unicode letter , single quote and hyphen).
Within word boundaries \\b[\\pL'-]+\\b
matches one word. Followed by any amount of words, that are preceded by one or more \\h+
horizonal spaces, the final pattern for use with preg_match_all:
/\b[\pL'-]+(?:\h+[\pL'-]+)*\b/u
Already put into pattern delimiters and set u-modifier for unicode functionality.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.