简体   繁体   English

查找两个字符串中有多少个相同的词

[英]Find how many words are the same in two strings

I have this function where I would like to compare two strings and then return how many words exist but the following isn't working. 我有这个函数,我想比较两个字符串,然后返回存在的单词数,但以下内容不起作用。 I always seem to get 0 for SameWordCount and 1 for MasterAddressWordCount 我似乎总是为SameWordCount得到0,为MasterAddressWordCount得到1。

Any ideas? 有任何想法吗?

// some more string cleaning
        mastermkAddressKey = mastermkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
        mastermkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(mastermkAddressKey));
        mastermkAddressKey = mastermkAddressKey.Replace("  ", " |").Replace("| ", "").Replace("|", "");
        mastermkAddressKey = QbaseStrings.RemoveDuplicateWords(mastermkAddressKey);

        duplicatemkAddressKey = duplicatemkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
        duplicatemkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(duplicatemkAddressKey));
        duplicatemkAddressKey = duplicatemkAddressKey.Replace("  ", " |").Replace("| ", "").Replace("|", "");
        duplicatemkAddressKey = QbaseStrings.RemoveDuplicateWords(duplicatemkAddressKey);

        string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
        string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);

        int SameWordCount = 0;
        int MasterAddressWordCount = 0;

        foreach (string masterWord in masterAddressSeparateWords)
                {
                    foreach (string duplicateWord in duplicateAddressSeparateWords)
                    {
                        if (masterWord == duplicateWord) {SameWordCount++;}
                    }

                    MasterAddressWordCount++;
                }

        int WordDifference = MasterAddressWordCount - SameWordCount;

        if (WordDifference == 0) { return "sure"; }
        if (WordDifference > 0 && WordDifference < 3) { return SameWordCount.ToString() + " " + MasterAddressWordCount.ToString(); }
        if (WordDifference > 2 && WordDifference < 5) { return "possible"; }

Your issue is because of new char[' '] , what you meant here was new char[] {' '} . 您的问题是因为有new char[' '] ,这里的意思是new char[] {' '} The compiler is (very helpfully) converting ' ' to an int here, making it char[int] . 编译器在这里(非常有帮助)将' '转换为int ,使其成为char[int] This means that this: 这意味着:

new char[' ']

Is really the same as: 确实与:

new char[32]

Which ends up being a big useless char[] array, rather than the single space you were after. 最终是一个大的无用char[]数组,而不是您所需要的单个空间。


You can see this cleanly by looking at the IL generated for: 您可以通过查看为以下内容生成的IL清楚地看到这一点:

var a = new char[' '];

Which is: 这是:

IL_0001:  ldc.i4.s    20
IL_0003:  newarr      System.Char
IL_0008:  stloc.0     // a

20 being a hex representation of 32. 20是32的十六进制表示。

I've solved the problem by changing the following lines: 我已经通过更改以下行解决了该问题:

string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
        string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);

To: 至:

string[] masterAddressSeparateWords = mastermkAddressKey.Split(' ');
string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(' ');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM