简体   繁体   English

迭代文件夹中的多个txt文件以在C#中读取它们

[英]Iterating multiple txt files in folder to read them in C#

Problem: I need to iterate through multiple files in a folder and read them. 问题:我需要遍历一个文件夹中的多个文件并读取它们。 They are .txt files. 它们是.txt文件。 While reading I need to note what words occured in each file. 阅读时,我需要注意每个文件中出现了哪些单词。

For example: 例如:

File 1 text: "John is my friend friend" -> words: John, is, my, friend 文件1文字:“约翰是我的朋友朋友”->文字:约翰,是我的朋友

File 2 text: "John is Mark" -> words: John, is, Mark 文件2的文字:“约翰是马克”->文字:约翰,是马克

Currently I was reading files and then making it one big file, but it does not work like this so I have to read them separately. 目前,我正在读取文件,然后将其制作为一个大文件,但是它不能像这样工作,因此我必须分别读取它们。 Old idea: 旧主意:

string[] filesZ = { "1.txt", "2.txt" };

var allLinesZ = filesZ.SelectMany(i => System.IO.File.ReadAllLines(i));
System.IO.File.WriteAllLines("n.txt", allLinesZ.ToArray());

var logFileZ = File.ReadAllLines("n.txt");

So this is the first question, how to iterate through them and reading all of them without making a big file. 因此,这是第一个问题,即如何遍历它们并读取所有它们而不生成大文件。

The second one will be how to make a counter to all of the words for seperate files, currently for one big file I am using: 第二个是如何对单独文件的所有单词进行计数,目前针对我正在使用的一个大文件:

var logFileZ = File.ReadAllLines("n.txt");

List<string> LogListZ = new List<string>(logFileZ);

var fi = new Dictionary<string, int>();
LogListZ.ForEach(str => AddToDictionary(fi, str));

foreach (var entry in fi)
{
    Console.WriteLine(entry.Key + ": " + entry.Value);
}

This is AddToDictionary: 这是AddToDictionary:

static void AddToDictionary(Dictionary<string, int> dictionary, string input)
{
    input.Split(new[] { ' ', ',', '.', '?', '!', '.' }, StringSplitOptions.RemoveEmptyEntries).ToList().ForEach(n =>
    {
        if (dictionary.ContainsKey(n))
            dictionary[n]++;
        else
            dictionary.Add(n, 1);
    });
}

I was thinking about making a loop through all the files (is it possible?) and inside make a counter that counts word for example John in how many files it was. 我当时正在考虑遍历所有文件(是否可以?),并在内部创建一个计数器,该计数器可以计算单词(例如John)中有多少个文件。 I don't need a specific file number, just a number of occurence of a word, without counting (like in example file 1) words twice (friend). 我不需要一个特定的文件编号,只需一个单词的出现次数,而无需两次(例如在示例文件1中)将单词数两次(朋友)。

You don't have to do much for part one of your question: remove WriteAllLines , remove the ReadAllLines for "n.txt" , rename allLinesZ variable to logFileZ , and add ToList or ToArray call: 你不必为你的问题的第一部分做多:删除WriteAllLines ,删除ReadAllLines"n.txt" ,重命名allLinesZ变量logFileZ ,并添加ToListToArray调用:

var logFileZ = filesZ
    .SelectMany(i => System.IO.File.ReadAllLines(i))
    .ToList();

You can make a counter in one go as well: split each string as you go, feed it to SelectMany , use GroupBy , and convert to dictionary using Count() as the value: 您也可以一次性创建一个计数器:随手拆分每个字符串,将其提供给SelectMany ,使用GroupBy ,并使用Count()作为值转换为字典:

var counts = filesZ
    .SelectMany(i => System.IO.File.ReadAllLines(i)
        .SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', '.' })
        .Distinct())
    .GroupBy(word => word)
    .ToDictionary(g => g.Key, g => g.Count());

The call of Distinct() ensures that the same word will not be counted twice if it's in a single file. 调用Distinct()可确保同一单词在单个文件中不会被计数两次。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM