C＃詞典-ContainsKey函數返回錯誤的值

Question

我試圖使用Dictionary of映射一些單詞（int並不那么相關）。 將單詞插入dic后（我檢查了它），我嘗試遍歷整個文檔並查找特定單詞。

當我這樣做時，即使單詞存在於dic中，它也會返回false。

可能是什么問題，我該如何解決？

public string RemoveStopWords(string originalDoc){
        string updatedDoc = "";
        string[] originalDocSeperated = originalDoc.Split(' ');
        foreach (string word in originalDocSeperated)
        {
            if (!stopWordsDic.ContainsKey(word))
            {
                updatedDoc += word;
                updatedDoc += " ";
            }
        }
        return updatedDoc.Substring(0, updatedDoc.Length - 1); //Remove Last Space
    }

例如：dic包含停用詞，如單詞“ the”。 當我從originalDoc中得到一個單詞“ the”，然后想要檢查它是否不存在時，它仍會輸入IF語句，並且兩者都寫相同！ 不區分大小寫

Dictionary<string, int> stopWordsDic = new Dictionary<string, int>();

string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
            string[] stopWordsSeperated = stopWordsContent.Split('\n');
            foreach (string stopWord in stopWordsSeperated)
            {
                stopWordsDic.Add(stopWord, 1);
            }

stopWords文件是在每一行中都有一個單詞的文件

快照：

謝謝

Answer 1

這只是一個猜測（對於注釋來說太長了），但是當您在Dictionary中插入時，您將被\\n分割。

因此，如果您正在使用的文本文件中的實際分隔符為\\r\\n ，則在插入的鍵上將留下\\r ，因此在ContainsKey上找不到它們。

因此，我將從string[] stopWordsSeperated = stopWordsContent.Split(new string[] { "\\r\\n", "\\n" }, StringSplitOptions.None); 然后修剪

附帶說明一下，如果您不使用字典的int值作為任何內容，則最好使用HashSet<string>和Contains而不是ContainsKey

Answer 2

你有一個！ （而不是）if語句中的運算符。 您正在檢查字典是否不包含鍵。 從條件開始時刪除感嘆號。

Answer 3

創建字典時，您需要執行以下操作：

var stopWords= new Dictionary<string, int>(
    StringComparer.InvariantCultureIgnoreCase);

最重要的部分是InvariantCultureIgnoreCase。

public string RemoveStopWords(string originalDoc){
    return String.Join(" ", 
           originalDoc.Split(' ')
              .Where(x => !stopWordsDic.ContainsKey(x))
    );
}

此外，您應該更改字典的填充方式（這會在創建字典時從字典中消除所有非單詞符號）：

        // Regex to find the first word inside a string regardless of the 
        // preleading symbols. Cuts away all nonword symbols afterwards
        Regex validWords = New Regex(@"\b([0-9a-zA-Z]+?)\b");

        string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
        string[] stopWordsSeperated = stopWordsContent.Split('\n');

        foreach (string stopWord in stopWordsSeperated)
        {
            stopWordsDic.Add(validWords.Match(stopWord).Value, 1);
        }

Answer 4

我看到您正在將1設置為所有條目的值。 列表可能會更適合您的需求：

List<string> stopWordsDic = new List<string>();

string stopWordsContent = System.IO.File.ReadAllText(stopWordsPath);
string[] stopWordsSeperated = stopWordsContent.Split(Environment.NewLine);
foreach (string stopWord in stopWordsSeperated)
{
    stopWordsDic.Add(stopWord);
}

然后使用Contains()檢查元素

public string RemoveStopWords(string originalDoc){
    string updatedDoc = "";
    string[] originalDocSeperated = originalDoc.Split(' ');
    foreach (string word in originalDocSeperated)
    {
        if (!stopWordsDic.Contains(word))
        {
            string.Format("{0}{1}", word, string.Empty);
            //updatedDoc += word;
            //updatedDoc += " ";
        }
    }
    return updatedDoc.Substring(0, updatedDoc.Length - 1); //Remove Last Space
}

C＃詞典-ContainsKey函數返回錯誤的值

問題描述

4 個解決方案

解決方案1
3 已采納 2015-11-13 09:02:37

解決方案2
1 2015-11-13 08:29:57

解決方案3
0 2015-11-13 08:48:03

解決方案4
0 2015-11-13 09:07:50

C＃詞典-ContainsKey函數返回錯誤的值

問題描述

4 個解決方案

解決方案1 3 已采納 2015-11-13 09:02:37

解決方案2 1 2015-11-13 08:29:57

解決方案3 0 2015-11-13 08:48:03

解決方案4 0 2015-11-13 09:07:50

解決方案1
3 已采納 2015-11-13 09:02:37

解決方案2
1 2015-11-13 08:29:57

解決方案3
0 2015-11-13 08:48:03

解決方案4
0 2015-11-13 09:07:50