[英]How to remove similar entries from array php
So i have array like this and sometimes it has very similar entries: 所以我有这样的数组,有时它具有非常相似的条目:
Array
(
[0] => greys anatomy
[1] => element 3d
[2] => interstellar
[3] => monster ball
[4] => scorpion
[5] => taken 3
[6] => the flash
[7] => wild card
[8] => big bang theory
[9] => the big bang theory
[10] => fredrik kempe vincero
[11] => fredrik kempe vicero
)
I would like to remove similar entries that are longer ones. 我想删除较长的类似条目。 So for example in this array: [9] => the big bang theory
and [10] => fredrik kempe vincero
entries should be removed. 因此,例如在此数组中: [9] => the big bang theory
和[10] => fredrik kempe vincero
条目应被删除。 as they are similar to 8th and 11th entry, but longer. 因为它们类似于第8和第11项,但更长。
EDIT: So if anyone needs, I made working solution out of two answers below: 编辑:因此,如果有人需要,我可以从以下两个答案中做出工作解决方案:
function check_similar($first, $second)
{
similar_text($first, $second, $percent);
if ($percent >= 80) { //needed percent value
return true;
}
else {
return false;
}
}
for ($i = 0; $i < count($array); $i++) {
for ($j = $i; $j < count($array); $j++) {
if ($j > $i && check_similar($array[$i],$array[$j]) == true) {
$array[$j] = null;
}
}
}
// filter array to remove null values and reindex
$array = array_values(array_filter($array));
print_r($array);
String similarity is a very difficoult problem that cannot be solved easily. 字符串相似度是一个很难解决的难题。 There are several complex approaches, but none can be effective as if it was made by a human being. 有几种复杂的方法,但是没有一种方法像人类那样有效。
Take a look on php soundhex and levenshtein which could be an easy solution for your particular case. 看看php soundhex和levenshtein ,这对于您的特定情况可能是一个简单的解决方案。
In any case, given a custom function that defines or not if a string is similar to another, to make your array unique you have to do something like: 无论如何,给定一个自定义函数来定义一个字符串是否与另一个字符串相似,要使您的数组唯一,您必须执行以下操作:
// set to null all subsequent similar strings
for ($i = 0; $i < count($array); $i++) {
for ($j = $i; $j < count($array); $j++) {
if ($j > $i && similar($array[$i],$array[$j])) {
$array[$j] = null;
}
}
}
// filter array to remove null values
$array = array_filter($array);
Take a look at the similar_text function. 看一下same_text函数。
similar_text('the big bang theory','big bang theory', $percent);
echo $percent; // 88%
This is obviously more difficult than it seems, but can do this check while making this array. 这显然比看起来要困难,但是可以在制作此数组时执行此检查。
See this link for an alternate implementation. 请参阅此链接以了解替代实现。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.