简体   繁体   English

如何从数组PHP中删除类似的条目

[英]How to remove similar entries from array php

So i have array like this and sometimes it has very similar entries: 所以我有这样的数组,有时它具有非常相似的条目:

Array
(
    [0] => greys anatomy
    [1] => element 3d
    [2] => interstellar
    [3] => monster ball
    [4] => scorpion
    [5] => taken 3
    [6] => the flash
    [7] => wild card
    [8] => big bang theory
    [9] => the big bang theory
    [10] => fredrik kempe vincero
    [11] => fredrik kempe vicero
)

I would like to remove similar entries that are longer ones. 我想删除较长的类似条目。 So for example in this array: [9] => the big bang theory and [10] => fredrik kempe vincero entries should be removed. 因此,例如在此数组中: [9] => the big bang theory[10] => fredrik kempe vincero条目应被删除。 as they are similar to 8th and 11th entry, but longer. 因为它们类似于第8和第11项,但更长。

EDIT: So if anyone needs, I made working solution out of two answers below: 编辑:因此,如果有人需要,我可以从以下两个答案中做出工作解决方案:

function check_similar($first, $second)
{
    similar_text($first, $second, $percent);
  if ($percent >= 80) { //needed percent value
    return true;
  }
  else {
    return false;
  }
}

for ($i = 0; $i < count($array); $i++) {
   for ($j = $i; $j < count($array); $j++) {
      if ($j > $i && check_similar($array[$i],$array[$j]) == true) {
         $array[$j] = null;
      }
   }
}
// filter array to remove null values and reindex
$array = array_values(array_filter($array));
print_r($array);

String similarity is a very difficoult problem that cannot be solved easily. 字符串相似度是一个很难解决的难题。 There are several complex approaches, but none can be effective as if it was made by a human being. 有几种复杂的方法,但是没有一种方法像人类那样有效。

Take a look on php soundhex and levenshtein which could be an easy solution for your particular case. 看看php soundhexlevenshtein ,这对于您的特定情况可能是一个简单的解决方案。

In any case, given a custom function that defines or not if a string is similar to another, to make your array unique you have to do something like: 无论如何,给定一个自定义函数来定义一个字符串是否与另一个字符串相似,要使您的数组唯一,您必须执行以下操作:

// set to null all subsequent similar strings
for ($i = 0; $i < count($array); $i++) {
   for ($j = $i; $j < count($array); $j++) {
      if ($j > $i && similar($array[$i],$array[$j])) {
         $array[$j] = null;
      }
   }
}
// filter array to remove null values
$array = array_filter($array);

Take a look at the similar_text function. 看一下same_text函数。

similar_text('the big bang theory','big bang theory', $percent);
echo $percent; // 88%

This is obviously more difficult than it seems, but can do this check while making this array. 这显然比看起来要困难,但是可以在制作此数组时执行此检查。

See this link for an alternate implementation. 请参阅此链接以了解替代实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM