[英]Best way to read a text file by chunks in C#
I need to read a big text file and search for a string in each line, each line separated by linebreak and I need to minimize I/O and RAM 我需要阅读一个大文本文件,并在每一行中搜索一个字符串,每行之间用换行符分隔,并且我需要最小化I / O和RAM
My idea is to separate the file into chunks, so I have two approachs: 我的想法是将文件分成多个块,因此我有两种方法:
1) Split the FileStream with something like this but then I risk that text lines will be cut in half and that can make things complex: 1)用类似这样的方法拆分FileStream,但是我冒着将文本行切成两半的风险,这会使事情变得复杂:
using (FileStream fsSource = new FileStream("InputFiles\\1.txt", FileMode.Open, FileAccess.Read))
{
// Read the source file into a byte array.
int numBytesToRead = 1024; // Your amount to read at a time
byte[] bytes = new byte[numBytesToRead];
int numBytesRead = 0;
while (numBytesToRead > 0)
{
// Read may return anything from 0 to numBytesToRead.
int n = fsSource.Read(bytes, numBytesRead, numBytesToRead);
// Break when the end of the file is reached.
if (n == 0)
break;
//done something with the lines here.
}
}
2) Create an extension method to split the list of lines into smaller lists of lines and then search the word in each line but I am unsure about how this method can affect I/O and RAM!. 2)创建一个扩展方法,将行列表分成较小的行列表,然后在每行中搜索单词,但是我不确定该方法如何影响I / O和RAM!
public static IEnumerable<IEnumerable<TValue>> Chunk<TValue>(this IEnumerable<TValue> values, int chunkSize)
{
using (var enumerator = values.GetEnumerator())
{
while (enumerator.MoveNext())
{
yield return GetChunk(enumerator, chunkSize).ToList();
}
}
}
private static IEnumerable<T> GetChunk<T>(IEnumerator<T> enumerator, int chunkSize)
{
do
{
yield return enumerator.Current;
} while (--chunkSize > 0 && enumerator.MoveNext());
}
Any thoughts or other methods I can use? 有什么想法或其他方法可以使用吗?
Thanks in advance. 提前致谢。
I think you are overcomplicating things. 我认为您太过复杂了。 The NET Framework has a lot of methods to choose from when you want to read a text file.
当您要阅读文本文件时,NET框架有很多方法可供选择。
If you need to process a big text file nothing better than using the method File.ReadLines because it doesn't load all the file in memory but allows you to work line by line 如果您需要处理一个大文本文件,则最好使用File.ReadLines方法,因为它不会加载内存中的所有文件,但可以逐行处理
As you can read from the MSDN docs 正如您可以从MSDN文档中阅读的
When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned;
当您使用ReadLines时,可以在返回整个集合之前开始枚举字符串的集合。
foreach(string line in File.ReadLines(@"InputFiles\1.txt"))
{
// Process your line here....
}
Use File.ReadLines method as it will read one line at a time into memory and you can perform some logic on that single line. 使用File.ReadLines方法,因为它将一次将一行读入内存,并且您可以在那一行上执行一些逻辑。
foreach(var thisLine in File.ReadLines("path"))
{
if(thisLine.Contains("Something"))
{
// Do something
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.