I am trying to remove duplicates from a set of tokens using unset (not considering array_unique for now), however I am getting a few issues.
$keywords = parseTweet ( $tweet );
$term_freq = array(count($keywords));
for($i = 0; $i < count($keywords); $i++){
$term_freq[$i] = 1;
for($j = 0; $j < count($keywords); $j++){
if (($i != $j) && (strcmp($keywords[$i],$keywords[$j]) == 0)){
unset ( $keywords [$j] );
unset ( $term_freq [$j] );
$term_freq[$i]++;
}
}
}
print_r ( $keywords );
print_r ( $term_freq );
I am aware of why I am getting an error; while the duplicate $j is removed, the for loop still has to reloop for the rest of the keywords and hence will fail when it encounters the missing $j. Checking the contents of the array, I found out that the index of the array skips the index $j. So it reads; [1], [2], [4], ... etc where $j = [3]
I thought that unset also rebalances the array index, am I doing something wrong or missing something completely? I am new to PHP so please bear with me!
Use foreach
instead of for
.
foreach ($keywords as $i => $kw1){
$term_freq[$i] = 1;
foreach ($keywords as $j => $kw2){
if (($i != $j) && ($kw1 == $kw2){
unset ( $keywords [$j] );
unset ( $term_freq [$j] );
$term_freq[$i]++;
}
}
}
So:
$c = count($keywords)
for($i = 0; $i < $c; $i++){
$term_freq[$i] = 1;
for($j = $i+1; $j < $c; $j++){ // magic is $j = $i+1
if( ! isset($keywords[$j]) { continue; } // skip unset indices
else if ( strcmp($keywords[$i],$keywords[$j]) == 0 ){
unset ( $keywords [$j] );
unset ( $term_freq [$j] );
$term_freq[$i]++;
}
}
}
Basically you know you've already checked everything prior to $i
, so you can start your inner loop at $i+1
instead of zero.
Also, you only need to count $keywords
once , not n² times.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.