I'm using the PHP function preg_match_all()
like below to create an array which contains multiple words.
// the string which contains the text
$string = "Lorem ipsum dolor sit amet elit";
// the preg_match_all() function
preg_match_all('/([a-z]*?)(?= )/i', $string, $matches);
// debug array
debug($matches[0]);
// output
[(int) 0 => 'Lorem',
(int) 1 => '',
(int) 2 => 'ipsum',
(int) 3 => '',
(int) 4 => 'dolor',
(int) 5 => '',
(int) 6 => 'sit',
(int) 7 => '',
(int) 8 => 'amet',
(int) 9 => ''
]
But when I debug or print the array with all words, the last word is removed from the array, in this case it will be the word "elit". How can I fix this?
You can use (?= |$)
as lookahead meaning a word is either followed by a non-word or end of input:
preg_match_all('/([a-z]+)(?=\W|$)/i', $string, $matches);
print_r($matches[0]);
output:
Array
(
[0] => Lorem
[1] => ipsum
[2] => dolor
[3] => sit
[4] => amet
[5] => consectetur
[6] => adipiscing
[7] => elit
[8] => Lorem
[9] => ipsum
[10] => dolor
[11] => sit
[12] => amet
[13] => consectetur
[14] => adipiscing
[15] => elit
)
btw you can get same using split operation:
$tokens = preg_split('/\h+/', $string);
\\h
matches a horizontal whitespace.
Use the following regex pattern to get all the words
\\w matches any word character (letter, number, underscore)
preg_match_all('#\w+#', $string, $words);
print_r($words);
Will output
Array
(
[0] => Array
(
[0] => Lorem
[1] => ipsum
[2] => dolor
[3] => sit
[4] => amet
[5] => consectetur
[6] => adipiscing
[7] => elit
[8] => Lorem
[9] => ipsum
[10] => dolor
[11] => sit
[12] => amet
[13] => consectetur
[14] => adipiscing
[15] => elit
)
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.