簡體   English   中英

PHP:在文本中查找帶空格和不帶空格的重復單詞

[英]PHP : Find repeated words with and without space in text

我可以用這個 function 在文本中找到重復的單詞:

$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';
    function repeated($str)
    {
        $str=trim($str);  
        $str=ereg_replace('[[:space:]]+', ' ',$str);  
        $words=explode(' ',$str);  
        foreach($words as $w)  
        {  
        $wordstats[($w)]++;  
        }  
        foreach($wordstats as $k=>$v)  
        {  
            if($v>=2)  
            {  
                print "$k"." , ";  
            }  
        }  
    }

這就是我喜歡的結果:

bob , good , person , is , a , the , best?

問:我怎樣才能得到結果重復詞和空間之間的多部分詞看起來像:

bob , good , person , is , a , the , best? , good person , is a , a good , is the , bob is
<?php
$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';

//all words:
$found = str_word_count(strtolower($str),1);
//get all words with occurance of more then 1
$counts = array_count_values($found);
$repeated = array_keys(array_filter($counts,function($a){return $a > 1;}));
//begin results with the groups of 1 word.
$results = $repeated;
while($word = array_shift($found)){
    if(!in_array($word,$repeated)) continue;
    $additions = array();
    while($add = array_shift($found)){
        if(!in_array($add,$repeated)) break;
        $additions[] = $add;
        $count = preg_match_all('/'.preg_quote($word).'\W+'.implode('\W+',$additions).'/si',$str,$matches);
        if($count > 1){
            $newmatch = $word.' '.implode(' ',$additions);
            if(!in_array($newmatch,$results)) $results[] = $newmatch;
        } else {
            break;
        }
    }
    if(!empty($additions)) array_splice($found,0,0,$additions);
}
var_dump($results);

產量:

array(17) {
  [0]=>
  string(3) "bob"
  [1]=>
  string(2) "is"
  [2]=>
  string(1) "a"
  [3]=>
  string(4) "good"
  [4]=>
  string(6) "person"
  [5]=>
  string(3) "the"
  [6]=>
  string(4) "best"
  [7]=>
  string(6) "bob is"
  [8]=>
  string(4) "is a"
  [9]=>
  string(9) "is a good"
  [10]=>
  string(16) "is a good person"
  [11]=>
  string(6) "a good"
  [12]=>
  string(13) "a good person"
  [13]=>
  string(11) "good person"
  [14]=>
  string(6) "is the"
  [15]=>
  string(11) "is the best"
  [16]=>
  string(8) "the best"
}

您不能將雙字添加到 $wordstats 數組中嗎?

$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';
function repeated($str)
{
    $str=trim($str);  
    $str=ereg_replace('[[:space:]]+', ' ',$str);  
    $words=explode(' ',$str);  
    $lastWord = '';
    foreach($words as $w)  
    {  
        $wordstats[($w)]++;  
        //skip the first loop because that is the only time it should be blank.
        if($lastWord!=''){
            $wordstats[$lastWord.' '.$w]++;
        }
        $lastWord = $w;
    }  
    foreach($wordstats as $k=>$v)  
    {  
        if($v>=2)  
        {  
            print "$k"." , ";  
        }  
    }  
}

我沒有對此進行測試,但它應該可以工作,因為它只是使用您正在使用的相同技術。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM