简体   繁体   English

如何在 PHP 中检查两个字符串的部分相似性

[英]How to check a partial similarity of two strings in PHP

Is it any function in PHP that check the % of similarity of two strings? PHP中是否有任何函数可以检查两个字符串的相似度百分比?

For example i have:例如我有:

$string1="Hello how are you doing" 
$string2= " hi, how are you"

and the function($string1, $string2) will return me true because the words "how", "are", "you" are present in the line.并且function($string1, $string2)将返回 true,因为“how”、“are”、“you”等词出现在该行中。

Or even better, return me 60% of similarity because "how", "are", "you" is a 3/5 of $string1 .或者更好的是,返回 60% 的相似度,因为 "how"、"are"、"you" 是$string1的 3/5。

Does any function exist in PHP which do that? PHP中是否存在执行此操作的任何函数?

As it's a nice question, I put some effort into it:因为这是一个很好的问题,所以我付出了一些努力:

<?php
$string1="Hello how are you doing";
$string2= " hi, how are you";

echo 'Compare result: ' . compareStrings($string1, $string2) . '%';
//60%


function compareStrings($s1, $s2) {
    //one is empty, so no result
    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    //replace none alphanumeric charactors
    //i left - in case its used to combine words
    $s1clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s1);
    $s2clean = preg_replace("/[^A-Za-z0-9-]/", ' ', $s2);

    //remove double spaces
    while (strpos($s1clean, "  ")!==false) {
        $s1clean = str_replace("  ", " ", $s1clean);
    }
    while (strpos($s2clean, "  ")!==false) {
        $s2clean = str_replace("  ", " ", $s2clean);
    }

    //create arrays
    $ar1 = explode(" ",$s1clean);
    $ar2 = explode(" ",$s2clean);
    $l1 = count($ar1);
    $l2 = count($ar2);

    //flip the arrays if needed so ar1 is always largest.
    if ($l2>$l1) {
        $t = $ar2;
        $ar2 = $ar1;
        $ar1 = $t;
    }

    //flip array 2, to make the words the keys
    $ar2 = array_flip($ar2);


    $maxwords = max($l1, $l2);
    $matches = 0;

    //find matching words
    foreach($ar1 as $word) {
        if (array_key_exists($word, $ar2))
            $matches++;
    }

    return ($matches / $maxwords) * 100;    
}
?>

As other answers have already said, you can use similar_text.正如其他答案已经说过的那样,您可以使用similar_text。 Here's the demonstration:这是演示:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

echo similar_text($string1, $string2, $perc); //12

echo $perc; //61.538461538462

will return 12, and will set in $perc the percentage of similarity as you asked for.将返回 12,并将在 $perc 中设置您要求的相似性百分比。

In addition to Alex Siri's answer and according to the following article:除了 Alex Siri 的回答和根据以下文章:

http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm http://docstore.mik.ua/orelly/webprog/php/ch04_06.htm

PHP provides several functions that let you test whether two strings are approximately equal: PHP 提供了几个函数来测试两个字符串是否近似相等:

$string1="Hello how are you doing" ;
$string2= " hi, how are you";

SOUNDEX声讯

if (soundex($string1) == soundex($string2)) {

  echo "similar";

} else {

  echo "not similar";

}

METAPHONE元音

if (metaphone($string1) == metaphone($string2)) {

   echo "similar";

} else {

  echo "not similar";

}

SIMILAR TEXT相似的文字

$similarity = similar_text($string1, $string2);

LEVENSHTEIN莱文斯坦

$distance = levenshtein($string1, $string2); 

Ok here is my function that makes it much interesting.好的,这是我的功能,它使它变得非常有趣。

I'm checking approximately similarity of strings.我正在检查字符串的大致相似性。

Here is a criteria I use for that.这是我为此使用的标准。

  1. The order of the words is important单词的顺序很重要
  2. The words can have 85% of similarity.单词可以有 85% 的相似度。

Example:例子:

$string1 = "How much will it cost to me" (string in vocabulary)
$string2 = "How much does costs it "   //("costs" instead "cost" -is a mistake) (user input);

Algorithm: 1) Check the similarity of words and create clean strings with "right" words (in the order it appear in vocabulary).算法: 1)检查单词的相似性并用“正确”的单词创建干净的字符串(按照它在词汇表中出现的顺序)。 OUTPUT: "how much it cost" 2) create clean string with "right words" in order it appear in user input.输出:“它要花多少钱” 2)用“正确的词”创建干净的字符串,以便它出现在用户输入中。 OUTPUT: "how much cost it" 3)Compare two outputs - if not the same - return no, else if same return yes.输出:“花费多少” 3)比较两个输出 - 如果不同 - 返回否,否则如果相同返回是。

error_reporting(E_ALL);
ini_set('display_errors', true);

$string1="сколько это стоит ваще" ;
$string2= "сколько будет стоить это будет мне";

if(compareStrings($string1, $string2)) {
 echo "yes";    
} else {
    echo 'no';
}
//echo compareStrings($string1, $string2);

function compareStrings($s1, $s2) {

    if (strlen($s1)==0 || strlen($s2)==0) {
        return 0;
    }

    while (strpos($s1, "  ")!==false) {
        $s1 = str_replace("  ", " ", $s1);
    }
    while (strpos($s2, "  ")!==false) {
        $s2 = str_replace("  ", " ", $s2);
    }

    $ar1 = explode(" ",$s1);
    $ar2 = explode(" ",$s2);
  //  $array1 = array_flip($ar1);
  //  $array2 = array_flip($ar2);
    $l1 = count($ar1);
    $l2 = count($ar2);

 $meaning="";
    $rightorder="";
    $compare=0;
    for ($i=0;$i<$l1;$i++) {


        for ($j=0;$j<$l2;$j++) {

            $compare = (similar_text($ar1[$i],$ar2[$j],$percent)) ;
          //  echo $compare;
if ($percent>=85) {
    $meaning=$meaning." ".$ar1[$i];
    $rightorder=$rightorder." ".$ar1[$j];
    $compare=0;
}

        }


    }
    //print_r($rightorder);
if ($rightorder==$meaning) {
    return true;
} else {
    return false;
}

}

i would love to hear your opinion and suggestion how to improve it我很想听听您的意见和建议如何改进它

You can use the PHP function similar_text .您可以使用 PHP 函数similar_text

int similar_text ( string $first , string $second)

Check the PHP doc at: http://php.net/manual/en/function.similar-text.php检查 PHP 文档:http: //php.net/manual/en/function.similar-text.php

Although this question is quite old but just adding my solution due to few reasons.虽然这个问题已经很老了,但由于几个原因只是添加了我的解决方案。 First is that the author desired of comparing similar words rather than string as per his comment.首先是作者希望根据他的评论比较相似的单词而不是字符串。 Secondly, most of the answer tried to solve it via similar_text which is not suitable for this problem because it compare the text by characters difference and find the similarity and that results in match of quite different strings too.其次,大多数答案都试图通过similar_text来解决它,这不适合这个问题,因为它通过字符差异比较文本并找到相似性,这也会导致完全不同的字符串匹配。 First answer given by @Hugo Delsing is using array_flip which reverse the keys and values but it will consider only word if key is repeated more than one time. @Hugo Delsing 给出的第一个答案是使用array_flip来反转键和值,但如果键重复多次,它将只考虑单词。 I have posted following answer which will compare the words.我已经发布了以下答案,它将比较单词。 The only issue it can give is that it won't consider the order of the words very much.它可以给出的唯一问题是它不会非常考虑单词的顺序。

function compareStrings($s1, $s2)
{
    if (strlen($s1) == 0 || strlen($s2) == 0) {
        return 0;
    }

    $ar1 = preg_split('/[^\w\-]+/', strtolower($s1), null, PREG_SPLIT_NO_EMPTY);
    $ar2 = preg_split('/[^\w\-]+/', strtolower($s2), null, PREG_SPLIT_NO_EMPTY);

    $l1 = count($ar1);
    $l2 = count($ar2);

    $ar2_copy = array_values($ar2);

    $matched_indices = [];
    $word_map = [];
    foreach ($ar1 as $k => $w1) {
        if (isset($word_map[$w1])) {
            if ($word_map[$w1][0] >= $k) {
                $matched_indices[$k] = $word_map[$w1][0];
            }
            array_splice($word_map[$w1], 0, 1);
        } else {
            $indices = array_keys($ar2_copy, $w1);
            $index_count = count($indices);
            if ($index_count) {
                if ($index_count == 1) {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given index from second array so that it won't repeat again
                    unset($ar2_copy[$indices[0]]);
                } else {
                    $matched_indices[$k] = $indices[0];
                    // remove the word at given indices from second array so that it won't repeat again
                    foreach ($indices as $index) {
                        unset($ar2_copy[$index]);
                    }
                    array_splice($indices, 0, 1);
                    $word_map[$w1] = $indices;
                }
            }
        }
    }
    return round(count($matched_indices) * 100 / $l1, 2);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM