简体   繁体   English

C#使用字典

[英]C# using dictionaries

I'm sorry in advance if it's bad to ask for this sort of help... but I don't know who else to ask. 预先感到抱歉,如果这样的帮助不好的话……但是我不知道还有谁要问。

I have an assignment to read two text files, and find the 10 longest words in the first file (and the amount of times they're repeated) which dont exist in the second file. 我的任务是读取两个文本文件,并在第一个文件中找到10个最长的单词(及其重复的次数),而第二个文件中不存在该单词。

I currently read both of the files with File.ReadAllLines then split them into arrays, where every element is a single word (punctuation marks removed as well) and removed empty entries. 我目前使用File.ReadAllLines读取两个文件,然后将它们拆分为数组,其中每个元素都是一个单词(还删除了标点符号)并删除了空条目。

The idea I had to pick out the words fitting the requirements was: to make a dictionary containing a string Word and an int Count. 我必须挑选出符合要求的单词的想法是:制作一个包含字符串Word和int Count的字典。 Then make a loop repeating for the first file's length.... firstly comparing the element with the entire dictionary - if it finds a match, increase the Count by 1. Then if it doesn't match with any of the dictionary elements - compare the given element with every element in the 2nd file through another loop, if it finds a match - just go on to the next element of the first file, if it doesn't find any matches - add the word to the dictionary, and set Count to 1. 然后循环重复第一个文件的长度...。首先将元素与整个字典进行比较-如果找到匹配项,则将Count加1。然后,如果与任何字典元素都不匹配-比较给定元素与第二个文件中的每个元素都通过另一个循环,如果找到匹配项,则继续-第一个文件的下一个元素,如果找不到匹配项,则将该单词添加到字典中,并设置数到1。

So my first question is: Is this actually the most efficient way to do this? 所以我的第一个问题是:这实际上是最有效的方法吗? (Don't forget I've only recently started studying c# and am not allowed to use linq) (别忘了我最近才开始学习C#,并且不允许使用linq)

Second question: How do I work with the dictionary, because most of the results I could find were very confusing, and we have not yet met them at university. 第二个问题:我如何使用字典,因为我能找到的大多数结果都令人困惑,而且我们在大学还没有见到它们。

My code so far: 到目前为止,我的代码:

    // Reading and making all the words lowercase for comparisons
    string punctuation = " ,.?!;:\"\r\n";
    string Read1 = File.ReadAllText("@\\..\\Book1.txt");
    Read1 = Read1.ToLower();
    string Read2 = File.ReadAllText("@\\..\\Book2.txt");
    Read2 = Read2.ToLower();

    //Working with the 1st file
    string[] FirstFileWords = Read1.Split(punctuation.ToCharArray());

    var temp1 = new List<string>();
    foreach (var word in FirstFileWords)
    {
        if (!string.IsNullOrEmpty(word))
            temp1.Add(word);
    }
    FirstFileWords = temp1.ToArray();

    Array.Sort(FirstFileWords, (x, y) => y.Length.CompareTo(x.Length));

    //Working with the 2nd file
    string[] SecondFileWords = Read2.Split(punctuation.ToCharArray());

    var temp2 = new List<string>();
    foreach (var word in SecondFileWords)
    {
        if (!string.IsNullOrEmpty(word))
            temp2.Add(word);
    }
    SecondFileWords = temp2.ToArray();

Well I think you've done very well so far. 好吧,我认为您到目前为止做得很好。 Not being able to use Linq here is torture ;) 在这里不能使用Linq是一种折磨;)

As for performance, you should consider making your SecondFileWords a HashSet<string> as this would increase lookup times if any word exists in the 2nd file tremendously without much effort. 至于性能,您应该考虑将SecondFileWords设置为HashSet<string>因为如果第二个文件中不存在任何单词,这会增加查找时间,而无需花费太多精力。 I wouldn't go much further in terms of performance optimization for an exercise like that if performance is not a key requirement. 如果性能不是关键要求,那么我就不会对诸如此类的练习进行性能优化。

Of course, you would have to check that you don't add duplicates to your 2nd list, so change your current implementation to something like: 当然,您必须检查是否没有将重复项添加到第二个列表中,因此请将当前实现更改为以下内容:

HashSet<string> temp2 = new HashSet<string>();

foreach (var word in SecondFileWords)
{
    if (!string.IsNullOrEmpty(word) && !temp2.Contains(word))
    {
        temp2.Add(word);
    }
}

Don't convert this back to an Array again, this is not necessary. 不要再次将其转换回数组,这是没有必要的。

This brings me back to your FirstFileWords which would contain duplicates too. 这使我回到您的FirstFileWords,它也将包含重复项。 This will cause issues later on when the top words might contain the same word multiple times. 当高位单词可能多次包含同一单词时,这将在以后引起问题。 So let's get rid of them. 因此,让我们摆脱它们。 Here it's more complicated as you need to retain the information how often a word appeared in your first list. 在这里,由于您需要保留信息以使单词在您的第一个列表中出现的频率更高,因此更加复杂。

So let's bring a Dictionary<string, int> into play here now. 因此,现在让我们在这里开始使用Dictionary<string, int> A Dictionary stores a lookup key, as the HashSet, but in addition, also a value. 字典将查找键存储为HashSet,但此外还存储一个值。 We will use the key for the word, and the value for a number that contains the amount of how often the word appeared in the first list. 我们将使用单词的关键字和数字的值,该数字包含单词在第一个列表中出现的频率。

Dictionary<string, int> temp1 = new Dictionary<string, int>();

foreach (var word in FirstFileWords)
{
    if (string.IsNullOrEmpty(word))
    {
        continue;
    }

    if (temp1.ContainsKey(word))
    {
        temp1[word]++;
    }
    else
    {
        temp1.Add(word, 1);
    }
}

Now a dictionary cannot be sorted, which complicates things at this point as you still need to get your sorting by word length done. 现在字典无法排序,这使事情变得复杂,因为您仍然需要按字长进行排序。 So let's get back to your Array.Sort method which I think is a good choice when you are not allowed to use Linq: 因此,让我们回到Array.Sort方法,当您不允许使用Linq时,我认为这是一个不错的选择:

KeyValuePair<string, int>[] firstFileWordsWithCount = temp1.ToArray();
Array.Sort(firstFileWordsWithCount, (x, y) => y.Key.Length.CompareTo(x.Key.Length));

Note : You are using .ToArray() in your example, so I think it's OK to use it. 注意 :您在示例中使用的是.ToArray() ,所以我认为可以使用它。 But strictly speaking, this would also fall unter using Linq IMHO. 但是严格来说,使用Linq IMHO也不会成功。

Now all that's left is working through your firstFileWordsWithCount array until you got 10 words that do not exist in the HashSet temp2 . 现在剩下的全部工作都在firstFileWordsWithCount数组中进行,直到您获得HashSet temp2不存在的10个单词。 Something like: 就像是:

int foundWords = 0;

foreach(KeyValuePair<string, int> candidate in firstFileWordsWithCount)
{
    if (!temp2.Contains(candidate.Key))
    {
        Console.WriteLine($"{candidate.Key}: {candidate.Value}");
        foundWords++;
    }

    if (foundWords >= 10)
    {
        break;
    }
}

If anything is unclear, just ask. 如果不清楚,请问。

This is what you'll get when using dictionaries: 这是使用字典时会得到的:

string File1 = "AMD Intel Skylake Processors Graphics Cards Nvidia Architecture Microprocessor Skylake SandyBridge KabyLake";
string File2 = "Graphics Nvidia";
Dictionary<string, int> Dic = new Dictionary<string, int>();
string[] File1Array = File1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
Array.Sort(File1Array, (s1, s2) => s2.Length.CompareTo(s1.Length));
foreach (string s in File1Array)
{
    if (Dic.ContainsKey(s))
    {
        Dic[s]++;
    }
    else
    {
        Dic.Add(s, 1);
    }
}

string[] File2Array = File2.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
foreach (string s in File2Array)
{
    if (Dic.ContainsKey(s))
    {
        Dic.Remove(s);
    }
}

int i = 0;
foreach (KeyValuePair<string, int> kvp in Dic)
{
i++;
    Console.WriteLine(kvp.Key + " " + kvp.Value);
    if (i == 9)
    {
        break;
    }
}

My earlier attempt was using LINQ, which is apparently not allowed but missed it. 我之前的尝试是使用LINQ,显然不允许这样做,但是错过了。

string[] Results = File1.Split(" ".ToCharArray()).Except(File2.Split(" ".ToCharArray())).OrderByDescending(s => s.Length).Take(10).ToArray();

for (int i = 0; i < Results.Length; i++)
{
    Console.WriteLine(Results[i] + " " + Regex.Matches(File1, Results[i]).Count);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM