[英]Finding similar strings in array
I need to harness similar_text()
for an array of values that look something like this: 我需要利用similar_text()
作为看起来像这样的值数组:
$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];
What I'm trying to do is find the words what are practically the same, ie lawyer
and lawyers
in the above array, and add the counts for them together in a new array. 我想做的是找到实际上相同的词,即上述数组中的lawyer
和lawyers
,然后将它们的计数加到一个新的数组中。
So lawyer
would be 4
as lawyers
would be associated to the original string of lawyer
. 因此, lawyer
将为4
因为lawyers
将与原始的lawyer
字符串相关联。
Keep in mind, this array will only ever be singular words and the length is unspecified, it could range from 1
to >99
. 请记住,此数组只能是单数单词,长度没有指定,范围可以从1
到>99
。
I had no idea where to start with this, so I gave it a crack with a foreach loop as you'll see below, but the intended output isn't as expected. 我不知道从哪里开始,所以我在下面用foreach循环给了它一个裂缝,但是预期的输出并没有达到预期。
foreach ( $strings as $key_one => $count_one ) {
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
if(!isset($counts[$key_one])) {
$counts[$key_one] = $count_one;
} else {
$counts[$key_one] += $count_two;
}
}
}
}
Note: The percent match is at 80
for this example (as the match for lawyer
& lawyers
is ~92%
) 注意: 百分比匹配是在80
对于该示例(作为匹配lawyer
& lawyers
是~92%
Which ends up giving me something similar to the following: 最终给了我类似于以下内容:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
[lawyers] => 2
)
Where I require it to be: 我要求它是:
Array
(
[lawyer] => 4
[business] => 3
[a] => 3
)
Notice how i require it to practically remove lawyers
and add the count to lawyer
. 请注意,我实际上是如何要求它罢免lawyers
并增加lawyer
人数的。
Your difficulty is that just as lawyer is similar to lawyers, lawyers is also similar to lawyer. 您的困难在于,就像律师与律师相似,律师也与律师相似。 So they both get their count bumped up by the other. 因此,他们俩的人数都增加了。
Try this: 尝试这个:
foreach ( $strings as $key_one => &$count_one ) {
if ($count_one == 0) continue; // skip it if we've already processed it
if (!isset($counts[$key_one]) {
$counts[$key_one] = $count_one;
$count_one = 0;
}
foreach ( $strings as $key_two => &$count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$count_two = 0;
}
}
}
The disadvantage of that is that you change your original $strings array which may not be ideal. 这样做的缺点是您更改了原始的$ strings数组,这可能不理想。 Here's another approach, keeping track of already-processed strings in another hash: 这是另一种方法,在另一个哈希中跟踪已处理的字符串:
$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
if (isset($already[$key_one])) continue; // skip if already processed
$counts[$key_one] = $count_one; // by definition this should be new
foreach ( $strings as $key_two => $count_two ) {
similar_text($key_two, $key_one, $percent);
if ($percent > 80) {
$counts[$key_one] += $count_two;
$already[$key_two] = true;
}
}
}
I would recommend the 2nd solution. 我会推荐第二种解决方案。
您可以随时使用
unset( $counts[$key_two] ) ;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.