简体   繁体   中英

How to split the large text file(32 GB) using C#

I tried to split the file about 32GB using the below code but I got the memory exception .

Please suggest me to split the file using C# .

string[] splitFile = File.ReadAllLines(@"E:\\JKS\\ImportGenius\\0.txt");

int cycle = 1;
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
var chunk = splitFile.Take(splitSize);
var rem = splitFile.Skip(splitSize);

while (chunk.Take(1).Count() > 0)
{
    string filename = "file" + cycle.ToString() + ".txt";
    using (StreamWriter sw = new StreamWriter(filename))
    {
        foreach (string line in chunk)
        {
    sw.WriteLine(line);
        }
    }
    chunk = rem.Take(splitSize);
    rem = rem.Skip(splitSize);
    cycle++;
}

Well, to start with you need to use File.ReadLines (assuming you're using .NET 4) so that it doesn't try to read the whole thing into memory. Then I'd just keep calling a method to spit the "next" however many lines to a new file:

int splitSize = Convert.ToInt32(txtNoOfLines.Text);
using (var lineIterator = File.ReadLines(...).GetEnumerator())
{
    bool stillGoing = true;
    for (int chunk = 0; stillGoing; chunk++)
    {
        stillGoing = WriteChunk(lineIterator, splitSize, chunk);
    }
}

...

private static bool WriteChunk(IEnumerator<string> lineIterator,
                               int splitSize, int chunk)
{
    using (var writer = File.CreateText("file " + chunk + ".txt"))
    {
        for (int i = 0; i < splitSize; i++)
        {
            if (!lineIterator.MoveNext())
            {
                return false;
            }
            writer.WriteLine(lineIterator.Current);
        }
    }
    return true;
}

Do not read immediately all lines into an array, but use StremReader.ReadLine method, like:

using (StreamReader sr = new StreamReader(@"E:\\JKS\\ImportGenius\\0.txt")) 
{
    while (sr.Peek() >= 0) 
    {
       var fileLine = sr.ReadLine();
       //do something with line
    }
}

Instead of reading all the file at once using File.ReadAllLines , use File.ReadLines in a foreach loop to read the lines as needed.

foreach (var line in File.ReadLines(@"E:\\JKS\\ImportGenius\\0.txt"))
{
    // Do something
}

Edit: On an unrelated note, you don't have to escape your backslashes when prefixing the string with a '@'. So either write "E:\\\\JKS\\\\ImportGenius\\\\0.txt" or @"E:\\JKS\\ImportGenius\\0.txt" , but @"E:\\\\JKS\\\\ImportGenius\\\\0.txt" is redundant.

 File.ReadAllLines 

That will read the whole file into memory .

To work with large files you need to only read what you need now into memory, and then throw that away as soon as you have finished with it.

A better option would be File.ReadLines which returns a lazy enumerator, data is only read into memory as you get the next line from the enumerator. Providing you avoid multiple enumerations (eg. don't use Count() ) only parts of the file will be read.

The problem here is that you are reading the entire file's content into memory at once with File.ReadAllLines() . What you need to do is open a FileStream with File.OpenRead() and read/write smaller chunks.

Edit: Actually for your case ReadLine is obviously better. See other answers. :)

使用StreamReade r读取文件,使用StreamWriter写入。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM