简体   繁体   English

C#词典-ContainsKey函数返回错误的值

[英]C# Dictionary - ContainsKey Function Return Wrong Value

Im trying to use Dictionary of for mapping some words (the int doesnt really so relevant). 我试图使用Dictionary of映射一些单词(int并不那么相关)。 after inserting the word to the dic (I checked it) i try to go over the whole doc and look for a specific word. 将单词插入dic后(我检查了它),我尝试遍历整个文档并查找特定单词。

when i do that, even if the word exist in dic, it return false. 当我这样做时,即使单词存在于dic中,它也会返回false。

what can be the problem and how can i fix it? 可能是什么问题,我该如何解决?

public string RemoveStopWords(string originalDoc){
        string updatedDoc = "";
        string[] originalDocSeperated = originalDoc.Split(' ');
        foreach (string word in originalDocSeperated)
        {
            if (!stopWordsDic.ContainsKey(word))
            {
                updatedDoc += word;
                updatedDoc += " ";
            }
        }
        return updatedDoc.Substring(0, updatedDoc.Length - 1); //Remove Last Space
    }

for examle: the dic contains stop words as the word "the". 例如:dic包含停用词,如单词“ the”。 when i get a word "the" from the originalDoc and then wanna check if it is not exist, it still enter the IF statement And both of them write the same! 当我从originalDoc中得到一个单词“ the”,然后想要检查它是否不存在时,它仍会输入IF语句,并且两者都写相同! no case sensitivity 不区分大小写

Dictionary<string, int> stopWordsDic = new Dictionary<string, int>();

string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
            string[] stopWordsSeperated = stopWordsContent.Split('\n');
            foreach (string stopWord in stopWordsSeperated)
            {
                stopWordsDic.Add(stopWord, 1);
            }

The stopWords file is a file which in each line there is a word stopWords文件是在每一行中都有一个单词的文件

snapshot: 快照: 在此处输入图片说明

thank you 谢谢

This is just a guess (just too long for a comment), but when you are inserting on your Dictionary , you are splitting by \\n . 这只是一个猜测(对于注释来说太长了),但是当您在Dictionary中插入时,您将被\\n分割。

So if the actual splitter in the text file you are using is \\r\\n , you'd be left with \\r 's on your inserted keys, thus not finding them on ContainsKey . 因此,如果您正在使用的文本文件中的实际分隔符为\\r\\n ,则在插入的键上将留下\\r ,因此在ContainsKey上找不到它们。

So I'd start with a string[] stopWordsSeperated = stopWordsContent.Split(new string[] { "\\r\\n", "\\n" }, StringSplitOptions.None); 因此,我将从string[] stopWordsSeperated = stopWordsContent.Split(new string[] { "\\r\\n", "\\n" }, StringSplitOptions.None); then trim 然后修剪


As a side note, if you are not using the dictionary int values for anything, you'd be better of using a HashSet<string> and Contains instead of ContainsKey 附带说明一下,如果您不使用字典的int值作为任何内容,则最好使用HashSet<string>Contains而不是ContainsKey

You have a ! 你有一个 ! (not) operator in your if statement. (而不是)if语句中的运算符。 You're checking to see if the dictionary does Not contain a key. 您正在检查字典是否不包含键。 Remove the exclamation mark from the start of your condition. 从条件开始时删除感叹号。

When you create the dictionary you would need to do the following: 创建字典时,您需要执行以下操作:

var stopWords= new Dictionary<string, int>(
    StringComparer.InvariantCultureIgnoreCase);

The most important part is the InvariantCultureIgnoreCase. 最重要的部分是InvariantCultureIgnoreCase。

public string RemoveStopWords(string originalDoc){
    return String.Join(" ", 
           originalDoc.Split(' ')
              .Where(x => !stopWordsDic.ContainsKey(x))
    );
}

Furthermore you should change how you fill the dictionary (this eliminates all non word symbols from your dictionary when creating it): 此外,您应该更改字典的填充方式(这会在创建字典时从字典中消除所有非单词符号):

        // Regex to find the first word inside a string regardless of the 
        // preleading symbols. Cuts away all nonword symbols afterwards
        Regex validWords = New Regex(@"\b([0-9a-zA-Z]+?)\b");

        string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
        string[] stopWordsSeperated = stopWordsContent.Split('\n');

        foreach (string stopWord in stopWordsSeperated)
        {
            stopWordsDic.Add(validWords.Match(stopWord).Value, 1);
        }

I see that you're setting 1 as the value for all entries. 我看到您正在将1设置为所有条目的值。 Maybe a List would better fit your needs: 列表可能会更适合您的需求:

List<string> stopWordsDic = new List<string>();

string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
string[] stopWordsSeperated = stopWordsContent.Split(Environment.NewLine);
foreach (string stopWord in stopWordsSeperated)
{
    stopWordsDic.Add(stopWord);
}

Then check for element with Contains() 然后使用Contains()检查元素

public string RemoveStopWords(string originalDoc){
    string updatedDoc = "";
    string[] originalDocSeperated = originalDoc.Split(' ');
    foreach (string word in originalDocSeperated)
    {
        if (!stopWordsDic.Contains(word))
        {
            string.Format("{0}{1}", word, string.Empty);
            //updatedDoc += word;
            //updatedDoc += " ";
        }
    }
    return updatedDoc.Substring(0, updatedDoc.Length - 1); //Remove Last Space
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM