简体   繁体   English

如何比较两个字符串并找到相似度的百分比

[英]How to compare two strings and find the percentage of similarity

The below code does the job, but takes lot of time.下面的代码完成了这项工作,但需要很多时间。 Am comparing the contents of two HTML files which I have saved as a string in MongoDB already.正在比较我已经在 MongoDB 中保存为字符串的两个 HTML 文件的内容。 And the length of the string is around 30K+ and have around 250K+ records to compare.并且字符串的长度大约为 30K+,并且有大约 250K+ 条记录可供比较。 Thus the job is taking quite lot of time.因此,这项工作需要花费大量时间。

Is there any easier way or plugin to use and is quite fast too?有没有更简单的方法或插件可以使用并且速度也很快?

private int ComputeCost(string input1, string input2)
{
    if (string.IsNullOrEmpty(input1))
        return string.IsNullOrEmpty(input2) ? 0 : input2.Length;

    if (string.IsNullOrEmpty(input2))
        return string.IsNullOrEmpty(input1) ? 0 : input1.Length;

    int input1Length = input1.Length;
    int input2Length = input2.Length;

    int[,] distance = new int[input1Length + 1, input2Length + 1];

    for (int i = 0; i <= input1Length; distance[i, 0] = i++) ;
    for (int j = 0; j <= input2Length; distance[0, j] = j++) ;

    for (int i = 1; i <= input1Length; i++)
    {
        for (int j = 1; j <= input2Length; j++)
        {
            int cost = (input2[j - 1] == input1[i - 1]) ? 0 : 1;

            distance[i, j] = Math.Min(
                                Math.Min(distance[i - 1, j] + 1, distance[i, j - 1] + 1),
                                distance[i - 1, j - 1] + cost);
        }
    }

    return distance[input1Length, input2Length];
}

As per @Kay Lee, made the function static and used HTML agility pack to remove unnecessary data.根据@Kay Lee,将函数设为静态并使用 HTML 敏捷包删除不必要的数据。 And saw a good performance improvement.并看到了良好的性能改进。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM