[英]Iterating multiple txt files in folder to read them in C#
Problem: I need to iterate through multiple files in a folder and read them. 问题:我需要遍历一个文件夹中的多个文件并读取它们。 They are .txt files.
它们是.txt文件。 While reading I need to note what words occured in each file.
阅读时,我需要注意每个文件中出现了哪些单词。
For example: 例如:
File 1 text: "John is my friend friend" -> words: John, is, my, friend 文件1文字:“约翰是我的朋友朋友”->文字:约翰,是我的朋友
File 2 text: "John is Mark" -> words: John, is, Mark 文件2的文字:“约翰是马克”->文字:约翰,是马克
Currently I was reading files and then making it one big file, but it does not work like this so I have to read them separately. 目前,我正在读取文件,然后将其制作为一个大文件,但是它不能像这样工作,因此我必须分别读取它们。 Old idea:
旧主意:
string[] filesZ = { "1.txt", "2.txt" };
var allLinesZ = filesZ.SelectMany(i => System.IO.File.ReadAllLines(i));
System.IO.File.WriteAllLines("n.txt", allLinesZ.ToArray());
var logFileZ = File.ReadAllLines("n.txt");
So this is the first question, how to iterate through them and reading all of them without making a big file. 因此,这是第一个问题,即如何遍历它们并读取所有它们而不生成大文件。
The second one will be how to make a counter to all of the words for seperate files, currently for one big file I am using: 第二个是如何对单独文件的所有单词进行计数,目前针对我正在使用的一个大文件:
var logFileZ = File.ReadAllLines("n.txt");
List<string> LogListZ = new List<string>(logFileZ);
var fi = new Dictionary<string, int>();
LogListZ.ForEach(str => AddToDictionary(fi, str));
foreach (var entry in fi)
{
Console.WriteLine(entry.Key + ": " + entry.Value);
}
This is AddToDictionary: 这是AddToDictionary:
static void AddToDictionary(Dictionary<string, int> dictionary, string input)
{
input.Split(new[] { ' ', ',', '.', '?', '!', '.' }, StringSplitOptions.RemoveEmptyEntries).ToList().ForEach(n =>
{
if (dictionary.ContainsKey(n))
dictionary[n]++;
else
dictionary.Add(n, 1);
});
}
I was thinking about making a loop through all the files (is it possible?) and inside make a counter that counts word for example John in how many files it was. 我当时正在考虑遍历所有文件(是否可以?),并在内部创建一个计数器,该计数器可以计算单词(例如John)中有多少个文件。 I don't need a specific file number, just a number of occurence of a word, without counting (like in example file 1) words twice (friend).
我不需要一个特定的文件编号,只需一个单词的出现次数,而无需两次(例如在示例文件1中)将单词数两次(朋友)。
You don't have to do much for part one of your question: remove WriteAllLines
, remove the ReadAllLines
for "n.txt"
, rename allLinesZ
variable to logFileZ
, and add ToList
or ToArray
call: 你不必为你的问题的第一部分做多:删除
WriteAllLines
,删除ReadAllLines
为"n.txt"
,重命名allLinesZ
变量logFileZ
,并添加ToList
或ToArray
调用:
var logFileZ = filesZ
.SelectMany(i => System.IO.File.ReadAllLines(i))
.ToList();
You can make a counter in one go as well: split each string as you go, feed it to SelectMany
, use GroupBy
, and convert to dictionary using Count()
as the value: 您也可以一次性创建一个计数器:随手拆分每个字符串,将其提供给
SelectMany
,使用GroupBy
,并使用Count()
作为值转换为字典:
var counts = filesZ
.SelectMany(i => System.IO.File.ReadAllLines(i)
.SelectMany(line => line.Split(new[] { ' ', ',', '.', '?', '!', '.' })
.Distinct())
.GroupBy(word => word)
.ToDictionary(g => g.Key, g => g.Count());
The call of Distinct()
ensures that the same word will not be counted twice if it's in a single file. 调用
Distinct()
可确保同一单词在单个文件中不会被计数两次。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.