简体   繁体   中英

Is array index rebalanced after using unset in PHP?

I am trying to remove duplicates from a set of tokens using unset (not considering array_unique for now), however I am getting a few issues.

$keywords = parseTweet ( $tweet );
$term_freq = array(count($keywords));

for($i = 0; $i < count($keywords); $i++){
    $term_freq[$i] = 1;
    for($j = 0; $j < count($keywords); $j++){
        if (($i != $j) && (strcmp($keywords[$i],$keywords[$j]) == 0)){
            unset ( $keywords [$j] );
            unset ( $term_freq [$j] );          
            $term_freq[$i]++; 
        }
    }
}

print_r ( $keywords );
print_r ( $term_freq );

I am aware of why I am getting an error; while the duplicate $j is removed, the for loop still has to reloop for the rest of the keywords and hence will fail when it encounters the missing $j. Checking the contents of the array, I found out that the index of the array skips the index $j. So it reads; [1], [2], [4], ... etc where $j = [3]

I thought that unset also rebalances the array index, am I doing something wrong or missing something completely? I am new to PHP so please bear with me!

Use foreach instead of for .

foreach ($keywords as $i => $kw1){
    $term_freq[$i] = 1;
    foreach ($keywords as $j => $kw2){
        if (($i != $j) && ($kw1 == $kw2){
            unset ( $keywords [$j] );
            unset ( $term_freq [$j] );          
            $term_freq[$i]++; 
        }
    }
}
  1. Check if the index is set or not.
  2. You're making needless, repetitive comparisons. Basically n² comparisons when at most n²/2 are required to compare every value in an array to every other value.

So:

$c = count($keywords)
for($i = 0; $i < $c; $i++){
    $term_freq[$i] = 1;
    for($j = $i+1; $j < $c; $j++){ // magic is $j = $i+1
        if( ! isset($keywords[$j]) { continue; } // skip unset indices
        else if ( strcmp($keywords[$i],$keywords[$j]) == 0 ){
            unset ( $keywords [$j] );
            unset ( $term_freq [$j] );          
            $term_freq[$i]++; 
        }
    }
}

Basically you know you've already checked everything prior to $i , so you can start your inner loop at $i+1 instead of zero.

Also, you only need to count $keywords once , not n² times.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM