简体   繁体   English

如何将大文件(> 1 GB)的编码转换为Windows 1252而不会出现内存不足异常?

[英]How do I convert encoding of a large file (>1 GB) in size - to Windows 1252 without an out-of-memory exception?

Consider: 考虑:

public static void ConvertFileToUnicode1252(string filePath, Encoding srcEncoding)
{
    try
    {
        StreamReader fileStream = new StreamReader(filePath);
        Encoding targetEncoding = Encoding.GetEncoding(1252);

        string fileContent = fileStream.ReadToEnd();
        fileStream.Close();

        // Saving file as ANSI 1252
        Byte[] srcBytes = srcEncoding.GetBytes(fileContent);
        Byte[] ansiBytes = Encoding.Convert(srcEncoding, targetEncoding, srcBytes);
        string ansiContent = targetEncoding.GetString(ansiBytes);

        // Now writes contents to file again
        StreamWriter ansiWriter = new StreamWriter(filePath, false);
        ansiWriter.Write(ansiContent);
        ansiWriter.Close();
        //TODO -- log success  details
    }
    catch (Exception e)
    {
        throw e;
        // TODO -- log failure details
    }
}

The above piece of code returns an out-of-memory exception for large files and only works for small-sized files. 上面的代码返回大文件的内存不足异常,仅适用于小型文件。

I think still using a StreamReader and a StreamWriter but reading blocks of characters instead of all at once or line by line is the most elegant solution. 我认为仍然使用StreamReaderStreamWriter但是读取字符块而不是一次性或逐行读取是最优雅的解决方案。 It doesn't arbitrarily assume the file consists of lines of manageable length, and it also doesn't break with multi-byte character encodings. 它不会随意假设文件由可管理长度的行组成,并且它也不会破坏多字节字符编码。

public static void ConvertFileEncoding(string srcFile, Encoding srcEncoding, string destFile, Encoding destEncoding)
{
    using (var reader = new StreamReader(srcFile, srcEncoding))
    using (var writer = new StreamWriter(destFile, false, destEncoding))
    {
        char[] buf = new char[4096];
        while (true)
        {
            int count = reader.Read(buf, 0, buf.Length);
            if (count == 0)
                break;

            writer.Write(buf, 0, count);
        }
    }
}

(I wish StreamReader had a CopyTo method like Stream does, if it had, this would be essentially a one-liner!) (我希望StreamReader有像Stream这样的CopyTo方法,如果有的话,这本质上就是一个单行!)

Don't readToEnd and read it like line by line or X characters at a time. 不要readToEnd并逐行读取或一次读取X字符。 If you read to end, you put your whole file into the buffer at once. 如果您阅读结束,则立即将整个文件放入缓冲区。

Try this: 尝试这个:

using (FileStream fileStream = new FileStream(filePath, FileMode.Open))
{
    int size = 4096;
    Encoding targetEncoding = Encoding.GetEncoding(1252);
    byte[] byteData = new byte[size];

    using (FileStream outputStream = new FileStream(outputFilepath, FileMode.Create))
    {
        int byteCounter = 0;

        do
        {
            byteCounter = fileStream.Read(byteData, 0, size);

            // Convert the 4k buffer
            byteData = Encoding.Convert(srcEncoding, targetEncoding, byteData);

            if (byteCounter > 0)
            {
                outputStream.Write(byteData, 0, byteCounter);
            }
        }
        while (byteCounter > 0);

        inputStream.Close();
    }
}

Might have some syntax errors as I've done it from memory but this is how I work with large files, read in a chunk at a time, do some processing and save the chunk back. 可能有一些语法错误,因为我是从内存中完成的,但这就是我如何使用大文件,一次读取一块,进行一些处理并保存块。 It's really the only way of doing it (streaming) without relying on massive IO overhead of reading everything and huge RAM consumption of storing it all, converting it all in memory and then saving it all back. 这是实现它(流式传输)的唯一方式,而不依赖于读取所有内容的大量IO开销以及存储所有内存的大量RAM消耗,将其全部转换为内存然后将其全部保存回来。

You can always adjust the buffer size. 您始终可以调整缓冲区大小。

If you want your old method to work without throwing the OutOfMemoryException , you need to tell the Garbage Collector to allow very large objects. 如果您希望旧方法在不抛出OutOfMemoryException情况下工作,则需要告知垃圾收集器允许非常大的对象。

In App.config, under <runtime> add this following line (you shouldn't need it with my code but it's worth knowing): 在App.config中,在<runtime>添加以下行(您的代码不需要它,但值得了解):

<gcAllowVeryLargeObjects enabled="true" />

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在多个限制下检索大量记录,而不会导致内存不足异常 - Retrieving large amount of records under multiple limitations, without causing an out-of-memory exception 如何将“无BOM”文件的编码更改为“ Windows-1252”编码文件? - How can I Change the encoding of a file 'without BOM' to an 'Windows - 1252' encoded file? .Net中的非常大的集合会导致内存不足异常 - Very large collection in .Net causes out-of-memory exception 在没有内存不足的情况下在c#中加载1000个图像 - load 1000 images in c# without out-of-memory exception 如何在 .NET 中从可能的 Windows 1252 &#39;ANSI&#39; 编码上传文件转换为 UTF8? - How do I convert from a possibly Windows 1252 'ANSI' encoded uploaded file to UTF8 in .NET? 序列化过程中奇怪的内存不足异常 - strange out-of-memory exception during serialization 如何在不使用磁盘且内存不足的情况下将大型文件从api流式传输到api? - How do I stream a large file from api to api without using disk and running out of memory? 当对2 GB XML文件SignedXml.ComputeSignature进行签名时,在具有8 GB RAM的64位窗口中抛出内存不足异常 - When signing 2 GB XML file SignedXml.ComputeSignature throwing out of memory exception in 64 bit windows with 8 GB RAM 内存不足异常,即使似乎有足够的内存可用 - out-of-memory exception even though enough memory seems to be available 如何在不使用拆分(内存不足问题)C# 的情况下将大型 csv 文件转换为 json - How to convert from large csv file into json without using split (Out Of Memory issue) C#
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM