简体   繁体   English

在数组中查找相似的字符串

[英]Finding similar strings in array

I need to harness similar_text() for an array of values that look something like this: 我需要利用similar_text()作为看起来像这样的值数组:

$strings = ["lawyer" => 3, "business" => 3, "lawyers" => 1, "a" => 3];

What I'm trying to do is find the words what are practically the same, ie lawyer and lawyers in the above array, and add the counts for them together in a new array. 我想做的是找到实际上相同的词,即上述数组中的lawyerlawyers ,然后将它们的计数加到一个新的数组中。

So lawyer would be 4 as lawyers would be associated to the original string of lawyer . 因此, lawyer将为4因为lawyers将与原始的lawyer字符串相关联。

Keep in mind, this array will only ever be singular words and the length is unspecified, it could range from 1 to >99 . 请记住,此数组只能是单数单词,长度没有指定,范围可以从1>99

I had no idea where to start with this, so I gave it a crack with a foreach loop as you'll see below, but the intended output isn't as expected. 我不知道从哪里开始,所以我在下面用foreach循环给了它一个裂缝,但是预期的输出并没有达到预期。

foreach ( $strings as $key_one => $count_one ) {
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            if(!isset($counts[$key_one])) {
                $counts[$key_one] = $count_one;
            } else {
                $counts[$key_one] += $count_two;
            }
        }
    }
}

Note: The percent match is at 80 for this example (as the match for lawyer & lawyers is ~92% ) 注意: 百分比匹配是在80对于该示例(作为匹配lawyerlawyers~92%

Which ends up giving me something similar to the following: 最终给了我类似于以下内容:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
    [lawyers] => 2
)

Where I require it to be: 我要求它是:

Array
(
    [lawyer] => 4
    [business] => 3
    [a] => 3
)

Notice how i require it to practically remove lawyers and add the count to lawyer . 请注意,我实际上是如何要求它罢免lawyers并增加lawyer人数的。

Your difficulty is that just as lawyer is similar to lawyers, lawyers is also similar to lawyer. 您的困难在于,就像律师与律师相似,律师也与律师相似。 So they both get their count bumped up by the other. 因此,他们俩的人数都增加了。

Try this: 尝试这个:

foreach ( $strings as $key_one => &$count_one ) {
    if ($count_one == 0) continue; // skip it if we've already processed it
    if (!isset($counts[$key_one]) {
        $counts[$key_one] = $count_one;
        $count_one = 0;
    }
    foreach ( $strings as $key_two => &$count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $count_two = 0;
        }
    }
}

The disadvantage of that is that you change your original $strings array which may not be ideal. 这样做的缺点是您更改了原始的$ strings数组,这可能不理想。 Here's another approach, keeping track of already-processed strings in another hash: 这是另一种方法,在另一个哈希中跟踪已处理的字符串:

$already = $counts = array(); // not really necessary, but nice to init
foreach ( $strings as $key_one => $count_one ) {
    if (isset($already[$key_one])) continue; // skip if already processed
    $counts[$key_one] = $count_one; // by definition this should be new
    foreach ( $strings as $key_two => $count_two ) {
        similar_text($key_two, $key_one, $percent);
        if ($percent > 80) {
            $counts[$key_one] += $count_two;
            $already[$key_two] = true;
        }
    }
}

I would recommend the 2nd solution. 我会推荐第二种解决方案。

您可以随时使用

unset( $counts[$key_two] ) ;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM