简体   繁体   中英

PHP : Find repeated words with and without space in text

I can find repeated words in text with this function:

$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';
    function repeated($str)
        $str=ereg_replace('[[:space:]]+', ' ',$str);  
        $words=explode(' ',$str);  
        foreach($words as $w)  
        foreach($wordstats as $k=>$v)  
                print "$k"." , ";  

thats result me like:

bob , good , person , is , a , the , best?

Q: how i can get result repeated words and Multi-part words between space look like:

bob , good , person , is , a , the , best? , good person , is a , a good , is the , bob is
$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';

//all words:
$found = str_word_count(strtolower($str),1);
//get all words with occurance of more then 1
$counts = array_count_values($found);
$repeated = array_keys(array_filter($counts,function($a){return $a > 1;}));
//begin results with the groups of 1 word.
$results = $repeated;
while($word = array_shift($found)){
    if(!in_array($word,$repeated)) continue;
    $additions = array();
    while($add = array_shift($found)){
        if(!in_array($add,$repeated)) break;
        $additions[] = $add;
        $count = preg_match_all('/'.preg_quote($word).'\W+'.implode('\W+',$additions).'/si',$str,$matches);
        if($count > 1){
            $newmatch = $word.' '.implode(' ',$additions);
            if(!in_array($newmatch,$results)) $results[] = $newmatch;
        } else {
    if(!empty($additions)) array_splice($found,0,0,$additions);


array(17) {
  string(3) "bob"
  string(2) "is"
  string(1) "a"
  string(4) "good"
  string(6) "person"
  string(3) "the"
  string(4) "best"
  string(6) "bob is"
  string(4) "is a"
  string(9) "is a good"
  string(16) "is a good person"
  string(6) "a good"
  string(13) "a good person"
  string(11) "good person"
  string(6) "is the"
  string(11) "is the best"
  string(8) "the best"

can't you just add the double words to the $wordstats array?

$str = 'bob is a good person. mary is a good person. who is the best? are you a good person? bob is the best?';
function repeated($str)
    $str=ereg_replace('[[:space:]]+', ' ',$str);  
    $words=explode(' ',$str);  
    $lastWord = '';
    foreach($words as $w)  
        //skip the first loop because that is the only time it should be blank.
            $wordstats[$lastWord.' '.$w]++;
        $lastWord = $w;
    foreach($wordstats as $k=>$v)  
            print "$k"." , ";  

I didn't test this, but it should work because it just uses the same technique you are using.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM