[英]How to split the large text file(32 GB) using C#
I tried to split the file about 32GB using the below code but I got the memory exception
. 我尝试使用下面的代码将文件拆分为32GB,但我得到了
memory exception
。
Please suggest me to split the file using C#
. 请建议我使用
C#
拆分文件。
string[] splitFile = File.ReadAllLines(@"E:\\JKS\\ImportGenius\\0.txt");
int cycle = 1;
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
var chunk = splitFile.Take(splitSize);
var rem = splitFile.Skip(splitSize);
while (chunk.Take(1).Count() > 0)
{
string filename = "file" + cycle.ToString() + ".txt";
using (StreamWriter sw = new StreamWriter(filename))
{
foreach (string line in chunk)
{
sw.WriteLine(line);
}
}
chunk = rem.Take(splitSize);
rem = rem.Skip(splitSize);
cycle++;
}
Well, to start with you need to use File.ReadLines
(assuming you're using .NET 4) so that it doesn't try to read the whole thing into memory. 好吧,首先,您需要使用
File.ReadLines
(假设您使用的是.NET 4),以便它不会尝试将整个内容读入内存。 Then I'd just keep calling a method to spit the "next" however many lines to a new file: 然后我只是继续调用一个方法来向新文件吐出“下一行”但很多行:
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
using (var lineIterator = File.ReadLines(...).GetEnumerator())
{
bool stillGoing = true;
for (int chunk = 0; stillGoing; chunk++)
{
stillGoing = WriteChunk(lineIterator, splitSize, chunk);
}
}
...
private static bool WriteChunk(IEnumerator<string> lineIterator,
int splitSize, int chunk)
{
using (var writer = File.CreateText("file " + chunk + ".txt"))
{
for (int i = 0; i < splitSize; i++)
{
if (!lineIterator.MoveNext())
{
return false;
}
writer.WriteLine(lineIterator.Current);
}
}
return true;
}
Do not read immediately all lines into an array, but use StremReader.ReadLine method, like: 不要立即将所有行读入数组,而是使用StremReader.ReadLine方法,如:
using (StreamReader sr = new StreamReader(@"E:\\JKS\\ImportGenius\\0.txt"))
{
while (sr.Peek() >= 0)
{
var fileLine = sr.ReadLine();
//do something with line
}
}
Instead of reading all the file at once using File.ReadAllLines
, use File.ReadLines
in a foreach loop to read the lines as needed. 而不是使用
File.ReadAllLines
一次读取所有文件,而是在foreach循环中使用File.ReadLines
来根据需要读取行。
foreach (var line in File.ReadLines(@"E:\\JKS\\ImportGenius\\0.txt"))
{
// Do something
}
Edit: On an unrelated note, you don't have to escape your backslashes when prefixing the string with a '@'. 编辑:在不相关的注释中,在为字符串添加“@”前缀时,不必转义反斜杠。 So either write
"E:\\\\JKS\\\\ImportGenius\\\\0.txt"
or @"E:\\JKS\\ImportGenius\\0.txt"
, but @"E:\\\\JKS\\\\ImportGenius\\\\0.txt"
is redundant. 所以写
"E:\\\\JKS\\\\ImportGenius\\\\0.txt"
或@"E:\\JKS\\ImportGenius\\0.txt"
,但@"E:\\\\JKS\\\\ImportGenius\\\\0.txt"
是多余的。
File.ReadAllLines
That will read the whole file into memory . 这将把整个文件读入内存 。
To work with large files you need to only read what you need now into memory, and then throw that away as soon as you have finished with it. 要处理大型文件,您只需要在内存中读取您需要的内容,然后在完成后立即将其丢弃。
A better option would be File.ReadLines
which returns a lazy enumerator, data is only read into memory as you get the next line from the enumerator. 一个更好的选择是
File.ReadLines
,它返回一个惰性枚举器,当你从枚举器得到下一行时,数据只被读入内存。 Providing you avoid multiple enumerations (eg. don't use Count()
) only parts of the file will be read. 如果您避免多次枚举(例如,不使用
Count()
),则只会读取文件的某些部分。
The problem here is that you are reading the entire file's content into memory at once with File.ReadAllLines()
. 这里的问题是你使用
File.ReadAllLines()
一次将整个文件的内容读入内存。 What you need to do is open a FileStream with File.OpenRead()
and read/write smaller chunks. 您需要做的是使用
File.OpenRead()
打开FileStream并读取/写入较小的块。
Edit: Actually for your case ReadLine is obviously better. 编辑:实际上对于你的情况ReadLine显然更好。 See other answers.
看到其他答案。 :)
:)
使用StreamReade r读取文件,使用StreamWriter写入。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.