簡體   English   中英

使用PHP從單詞數組中查找字符串中單詞的非連續組合

[英]Find non-consecutive combinations of words in a string from array of words using PHP

我正在尋找一種方法來從$length $pattern中的$search的任何組合的(可能)非連續匹配模式的字符串返回起始位置和匹配模式。

在我的示例中,尋找數字是單詞的電話號碼。

$subject = "hello my name is inigo montoya you killed my father please call me at eight zero zero five five five one to three four prepare to die"

$search = array("zero", "one", "two", "to", "too" "three", "four", "five", "six", "seven", "eight", "nine")

$length = 10;

$result = jedi_find_trick($subject,$search,$length);

$result設置$result一個數組:

$result[0]["start"] = 70
$result[0]["match"] = "eight zero zero five five five one to three four"
$result[1] ... 

我將着手生成$search所有可能組合,但是我想有一個更優雅的解決方案可以使我逃脫,謝謝您的任何建議。


根據@ chris85的建議,這似乎是一個不錯的起點:

$subject = 'hello my name is inigo montoya you killed my father please call me at eight zero zero five five five one to three four or too oh five seven seven seven five one one one prepare to die';
$search = array('zero','oh','one','two','too','to','three','four','five','six','seven','eight','nine','hundred','thousand');
$replace = array('0','0','1','2','2','2','3','4','5','6','7','8','9','00','000');
$length = 10;

$result = jedi_find_trick($subject,$search,$replace,10);

$result = jedi_find_trick($subject,$search,$replace,$length);

print_r($result);

function jedi_find_trick($subject,$search,$replace,$length) {

    preg_match_all('/(\h*(' . implode('|', $search) . ')\h*){10}/', $subject, $numbers);

    foreach($numbers[0] as $match) {

        $number = str_replace($search,$replace,$match);
        $number = str_replace(' ', '', $number);
        $number = ' ' . $number . ' ';
        $subject = str_replace($match,$number,$subject);

    }

    return $subject;
}

返回:

hello my name is inigo montoya you killed my father please call me at 8005551234 or 2057775111 prepare to die

使用str_replace() “ too”將需要在$search位於“ to”之前,否則最終將出現“ 2o”。 尊重preg_replace()的某些單詞邊界應將其清除。

像這樣:

$subject = 'hello my name is inigo montoya you killed my father please call me '
         . 'at eight zero zero five five five one to three four prepare to die';

$search = ['zero', 'one', 'two', 'to', 'too', 'three', 'four', 'five', 'six',
           'seven', 'eight', 'nine'];

$length = 10;

function jedi_find_trick($search, $subject, $length, $sep = ' ', $septype = 0) {
    // quote special characters in the search list
    $search = array_map(function ($i) { return preg_quote($i, '~'); }, $search);
    // quote the separator when it is a literal string
    if ($septype === 0) $sep = preg_quote($sep, '~');

    // build the pattern
    $altern = '(?:' . implode('|', $search) . ')';

    $format = '~(?:%1$s|\A)(%2$s'
            . ($length<2 ? '': '(?:%1$s%2$s){%3$d}')
            . ')(?=%1$s|\z)~';

    $pattern = sprintf($format, $sep, $altern, $length - 1);

    if (preg_match_all($pattern, $subject, $matches, PREG_OFFSET_CAPTURE))
        return $matches[1];

    // return an empty array if there is no match
    return [];
}

print_r(jedi_find_trick($search, $subject, $length));
print_r(jedi_find_trick($search, $subject, 8, '\h+', 1));

默認情況下,分隔符為空格。 當septype不為0時,意味着必須將分隔符視為子模式(因此無需轉義特殊字符)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM