简体   繁体   English

如何使用C#拆分大文本文件(32 GB)

[英]How to split the large text file(32 GB) using C#

I tried to split the file about 32GB using the below code but I got the memory exception . 我尝试使用下面的代码将文件拆分为32GB,但我得到了memory exception

Please suggest me to split the file using C# . 请建议我使用C#拆分文件。

string[] splitFile = File.ReadAllLines(@"E:\\JKS\\ImportGenius\\0.txt");

int cycle = 1;
int splitSize = Convert.ToInt32(txtNoOfLines.Text);
var chunk = splitFile.Take(splitSize);
var rem = splitFile.Skip(splitSize);

while (chunk.Take(1).Count() > 0)
{
    string filename = "file" + cycle.ToString() + ".txt";
    using (StreamWriter sw = new StreamWriter(filename))
    {
        foreach (string line in chunk)
        {
    sw.WriteLine(line);
        }
    }
    chunk = rem.Take(splitSize);
    rem = rem.Skip(splitSize);
    cycle++;
}

Well, to start with you need to use File.ReadLines (assuming you're using .NET 4) so that it doesn't try to read the whole thing into memory. 好吧,首先,您需要使用File.ReadLines (假设您使用的是.NET 4),以便它不会尝试将整个内容读入内存。 Then I'd just keep calling a method to spit the "next" however many lines to a new file: 然后我只是继续调用一个方法来向新文件吐出“下一行”但很多行:

int splitSize = Convert.ToInt32(txtNoOfLines.Text);
using (var lineIterator = File.ReadLines(...).GetEnumerator())
{
    bool stillGoing = true;
    for (int chunk = 0; stillGoing; chunk++)
    {
        stillGoing = WriteChunk(lineIterator, splitSize, chunk);
    }
}

...

private static bool WriteChunk(IEnumerator<string> lineIterator,
                               int splitSize, int chunk)
{
    using (var writer = File.CreateText("file " + chunk + ".txt"))
    {
        for (int i = 0; i < splitSize; i++)
        {
            if (!lineIterator.MoveNext())
            {
                return false;
            }
            writer.WriteLine(lineIterator.Current);
        }
    }
    return true;
}

Do not read immediately all lines into an array, but use StremReader.ReadLine method, like: 不要立即将所有行读入数组,而是使用StremReader.ReadLine方法,如:

using (StreamReader sr = new StreamReader(@"E:\\JKS\\ImportGenius\\0.txt")) 
{
    while (sr.Peek() >= 0) 
    {
       var fileLine = sr.ReadLine();
       //do something with line
    }
}

Instead of reading all the file at once using File.ReadAllLines , use File.ReadLines in a foreach loop to read the lines as needed. 而不是使用File.ReadAllLines一次读取所有文件,而是在foreach循环中使用File.ReadLines来根据需要读取行。

foreach (var line in File.ReadLines(@"E:\\JKS\\ImportGenius\\0.txt"))
{
    // Do something
}

Edit: On an unrelated note, you don't have to escape your backslashes when prefixing the string with a '@'. 编辑:在不相关的注释中,在为字符串添加“@”前缀时,不必转义反斜杠。 So either write "E:\\\\JKS\\\\ImportGenius\\\\0.txt" or @"E:\\JKS\\ImportGenius\\0.txt" , but @"E:\\\\JKS\\\\ImportGenius\\\\0.txt" is redundant. 所以写"E:\\\\JKS\\\\ImportGenius\\\\0.txt"@"E:\\JKS\\ImportGenius\\0.txt" ,但@"E:\\\\JKS\\\\ImportGenius\\\\0.txt"是多余的。

 File.ReadAllLines 

That will read the whole file into memory . 这将把整个文件入内存

To work with large files you need to only read what you need now into memory, and then throw that away as soon as you have finished with it. 要处理大型文件,您只需要在内存中读取您需要的内容,然后在完成后立即将其丢弃。

A better option would be File.ReadLines which returns a lazy enumerator, data is only read into memory as you get the next line from the enumerator. 一个更好的选择是File.ReadLines ,它返回一个惰性枚举器,当你从枚举器得到下一行时,数据只被读入内存。 Providing you avoid multiple enumerations (eg. don't use Count() ) only parts of the file will be read. 如果您避免多次枚举(例如,不使用Count() ),则只会读取文件的某些部分。

The problem here is that you are reading the entire file's content into memory at once with File.ReadAllLines() . 这里的问题是你使用File.ReadAllLines()一次将整个文件的内容读入内存。 What you need to do is open a FileStream with File.OpenRead() and read/write smaller chunks. 您需要做的是使用File.OpenRead()打开FileStream并读取/写入较小的块。

Edit: Actually for your case ReadLine is obviously better. 编辑:实际上对于你的情况ReadLine显然更好。 See other answers. 看到其他答案。 :) :)

使用StreamReade r读取文件,使用StreamWriter写入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在c#中将大文件拆分成块? - How to split a large file into chunks in c#? 如何将大文件(12gb)分成多个1GB压缩(.gz)存档? C# - How to split big file(12gb) into multiple 1GB compressed(.gz) archives? C# 使用C#中的线程将大文本文件(500万条记录)并行拆分为较小的文件 - Split large text file (5 million records) into smaller files in parallel using threads in C# 如何使用 C# 将 JSON 文件的文本拆分成片段 - how to split text of a JSON file into pieces using C# C# 如何使用 TCP 客户端发送 1GB 文件 - C# How to send a 1GB file using TCP client 如何快速创建具有“自然”内容的大型(&gt; 1gb)文本+二进制文件? (C#) - How can I quickly create large (>1gb) text+binary files with “natural” content? (C#) 如何在C#中打开一个大文本文件 - How to open a large text file in C# C#如何将大文件上传到ftp(文件大小必须为500 MB-1 GB) - C# How to upload large file to ftp ( File size must 500 MB - 1 GB) 使用C#将大型(4gb)avi文件转换为mpeg或mp4格式 - Converting large (4gb) avi file to mpeg or mp4 format using C# 将大的二进制文件(5GB)读入C#的字节数组中? - Read a large binary file(5GB) into a byte array in C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM