PHP Regex for different languages

Question

I want to use regex as follows:

[a-z' ]*[a-z]

This won't work with different languages such as Chinese. Is it possible to create an inverse version of this regex to do the following:

Capture a word or words that are connected by a space

"Hey, july 2010"
=> hey
=> july

"hey what's up"
=> hey what's up

"汉漢字, 汉漢字 3004303"
=> 汉漢字
=> 汉漢字

Answer 1

First define your set of word characters: [\\pL'-] ( \\pL unicode letter , single quote and hyphen).

Within word boundaries \\b[\\pL'-]+\\b matches one word. Followed by any amount of words, that are preceded by one or more \\h+ horizonal spaces, the final pattern for use with preg_match_all:

/\b[\pL'-]+(?:\h+[\pL'-]+)*\b/u

Already put into pattern delimiters and set u-modifier for unicode functionality.

Demo at regex101.com

PHP Regex for different languages

Question

1 answers

solution1
1 ACCPTED 2015-10-14 02:58:24

PHP Regex for different languages

Question

1 answers

solution1 1 ACCPTED 2015-10-14 02:58:24

solution1
1 ACCPTED 2015-10-14 02:58:24