简体   繁体   中英

PHP Regex for different languages

I want to use regex as follows:

[a-z' ]*[a-z]

This won't work with different languages such as Chinese. Is it possible to create an inverse version of this regex to do the following:

Capture a word or words that are connected by a space

"Hey, july 2010"
=> hey
=> july

"hey what's up"
=> hey what's up

"汉漢字, 汉漢字 3004303"
=> 汉漢字
=> 汉漢字

First define your set of word characters: [\\pL'-] ( \\pL unicode letter , single quote and hyphen).

Within word boundaries \\b[\\pL'-]+\\b matches one word. Followed by any amount of words, that are preceded by one or more \\h+ horizonal spaces, the final pattern for use with preg_match_all:

/\b[\pL'-]+(?:\h+[\pL'-]+)*\b/u

Already put into pattern delimiters and set u-modifier for unicode functionality.

Demo at regex101.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM