[英]Find how many words are the same in two strings
I have this function where I would like to compare two strings and then return how many words exist but the following isn't working. 我有这个函数,我想比较两个字符串,然后返回存在的单词数,但以下内容不起作用。 I always seem to get 0 for SameWordCount and 1 for MasterAddressWordCount
我似乎总是为SameWordCount得到0,为MasterAddressWordCount得到1。
Any ideas? 有任何想法吗?
// some more string cleaning
mastermkAddressKey = mastermkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
mastermkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(mastermkAddressKey));
mastermkAddressKey = mastermkAddressKey.Replace(" ", " |").Replace("| ", "").Replace("|", "");
mastermkAddressKey = QbaseStrings.RemoveDuplicateWords(mastermkAddressKey);
duplicatemkAddressKey = duplicatemkAddressKey.Replace(",", " ").Replace(".", " ").Trim();
duplicatemkAddressKey = Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(duplicatemkAddressKey));
duplicatemkAddressKey = duplicatemkAddressKey.Replace(" ", " |").Replace("| ", "").Replace("|", "");
duplicatemkAddressKey = QbaseStrings.RemoveDuplicateWords(duplicatemkAddressKey);
string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
int SameWordCount = 0;
int MasterAddressWordCount = 0;
foreach (string masterWord in masterAddressSeparateWords)
{
foreach (string duplicateWord in duplicateAddressSeparateWords)
{
if (masterWord == duplicateWord) {SameWordCount++;}
}
MasterAddressWordCount++;
}
int WordDifference = MasterAddressWordCount - SameWordCount;
if (WordDifference == 0) { return "sure"; }
if (WordDifference > 0 && WordDifference < 3) { return SameWordCount.ToString() + " " + MasterAddressWordCount.ToString(); }
if (WordDifference > 2 && WordDifference < 5) { return "possible"; }
Your issue is because of new char[' ']
, what you meant here was new char[] {' '}
. 您的问题是因为有
new char[' ']
,这里的意思是new char[] {' '}
。 The compiler is (very helpfully) converting ' '
to an int
here, making it char[int]
. 编译器在这里(非常有帮助)将
' '
转换为int
,使其成为char[int]
。 This means that this: 这意味着:
new char[' ']
Is really the same as: 确实与:
new char[32]
Which ends up being a big useless char[]
array, rather than the single space you were after. 最终是一个大的无用
char[]
数组,而不是您所需要的单个空间。
You can see this cleanly by looking at the IL generated for: 您可以通过查看为以下内容生成的IL清楚地看到这一点:
var a = new char[' '];
Which is: 这是:
IL_0001: ldc.i4.s 20
IL_0003: newarr System.Char
IL_0008: stloc.0 // a
20 being a hex representation of 32. 20是32的十六进制表示。
I've solved the problem by changing the following lines: 我已经通过更改以下行解决了该问题:
string[] masterAddressSeparateWords = mastermkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(new char[' '], StringSplitOptions.RemoveEmptyEntries);
To: 至:
string[] masterAddressSeparateWords = mastermkAddressKey.Split(' ');
string[] duplicateAddressSeparateWords = duplicatemkAddressKey.Split(' ');
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.