使用 preg_match_all 和正則表達式創建一個單詞數組

Question

我正在使用如下所示的 PHP 函數preg_match_all()創建一個包含多個單詞的數組。

// the string which contains the text 
$string = "Lorem ipsum dolor sit amet elit";

// the preg_match_all() function
preg_match_all('/([a-z]*?)(?= )/i', $string, $matches);

// debug array
debug($matches[0]);

// output
[(int) 0 => 'Lorem',
    (int) 1 => '',
    (int) 2 => 'ipsum',
    (int) 3 => '',
    (int) 4 => 'dolor',
    (int) 5 => '',
    (int) 6 => 'sit',
    (int) 7 => '',
    (int) 8 => 'amet',
    (int) 9 => ''
]

但是當我調試或打印包含所有單詞的數組時，最后一個單詞從數組中刪除，在這種情況下它將是單詞“elit”。 我怎樣才能解決這個問題？

Answer 1

您可以使用(?= |$)作為前瞻，意思是一個單詞后跟一個非單詞或輸入的結尾：

preg_match_all('/([a-z]+)(?=\W|$)/i', $string, $matches);

print_r($matches[0]);

輸出：

Array
(
    [0] => Lorem
    [1] => ipsum
    [2] => dolor
    [3] => sit
    [4] => amet
    [5] => consectetur
    [6] => adipiscing
    [7] => elit
    [8] => Lorem
    [9] => ipsum
    [10] => dolor
    [11] => sit
    [12] => amet
    [13] => consectetur
    [14] => adipiscing
    [15] => elit
)

順便說一句，您可以使用拆分操作獲得相同的結果：

$tokens = preg_split('/\h+/', $string);

\\h匹配水平空白。

Answer 2

使用以下正則表達式獲取所有單詞

\\w匹配任何單詞字符（字母、數字、下划線）

preg_match_all('#\w+#', $string, $words);
print_r($words);

會輸出

Array
(
    [0] => Array
        (
            [0] => Lorem
            [1] => ipsum
            [2] => dolor
            [3] => sit
            [4] => amet
            [5] => consectetur
            [6] => adipiscing
            [7] => elit
            [8] => Lorem
            [9] => ipsum
            [10] => dolor
            [11] => sit
            [12] => amet
            [13] => consectetur
            [14] => adipiscing
            [15] => elit
        )

)

使用 preg_match_all 和正則表達式創建一個單詞數組

問題描述

2 個解決方案

解決方案1
2 已采納 2016-01-12 15:06:19

解決方案2
2 2016-01-12 15:11:00

使用 preg_match_all 和正則表達式創建一個單詞數組

問題描述

2 個解決方案

解決方案1 2 已采納 2016-01-12 15:06:19

解決方案2 2 2016-01-12 15:11:00

解決方案1
2 已采納 2016-01-12 15:06:19

解決方案2
2 2016-01-12 15:11:00