简体   繁体   English

在内存中操作文本文件的最佳方法:首先读取为byte []? 读为File.ReadAllText()然后保存为二进制?

[英]Best approach for in memory manipulation of text file in memory: read as byte[] first? read as File.ReadAllText() then save as binary?

I need to change a file in memory, and currently I read the file to memory into a byte[] using a filestream and a binaryreader. 我需要更改内存中的文件,目前我使用文件流和二进制读取器将文件读入内存为byte []。

I was wondering whats the best approach to change that file in memory, convert the byte[] to string, make changes and do an Encoding.GetBytes()? 我想知道什么是在内存中更改该文件的最佳方法,将byte []转换为字符串,进行更改并执行Encoding.GetBytes()? or Read the file first as string using File.ReadAllText() and then Encoding.GetBytes()? 或者首先使用File.ReadAllText()然后使用Encoding.GetBytes()将文件作为字符串读取? or any approach will work without caveats? 或者任何方法都可以不加警告地运作?

Any special approaches? 任何特殊方法? I need to replace specific text inside files with additional chars or replacement strings, several 100,000 of files. 我需要用额外的字符或替换字符串替换文件中的特定文本,几十万个文件。 Reliability is preferred over efficiency. 可靠性优于效率。 Files are text like HTML, not binary files. 文件是HTML之类的文本,而不是二进制文件。

Read the files using File.ReadAllText() , modify them, then do byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file) . 使用File.ReadAllText()读取文件,修改它们,然后执行byte[] byteData = Encoding.UTF8.GetBytes(your_modified_string_from_file) Use the encoding with which the files were saved. 使用保存文件的编码。 This will give you an array of byte[] . 这将为您提供一个byte[]数组。 You can convert the byte[] to a stream like this: 您可以将byte[]转换为如下所示的流:

MemoryStream stream = new MemoryStream();
stream.Write(byteData, 0, byteData.Length);

Edit: It looks like one of the Add methods in the API can take a byte array, so you don't have to use a stream. 编辑:看起来API中的一个Add方法可以采用字节数组,因此您不必使用流。

You're definitely making things harder on yourself by reading into bytes first. 通过先读入字节,你肯定会让自己变得更难。 Just use a StreamReader. 只需使用StreamReader即可。 You can probably get away with using ReadLine() and processing a line at a time. 您可以使用ReadLine()并一次处理一行来逃脱。 This can seriously reduce your app's memory usage, especially if you're working with that many files. 这可以严重降低应用程序的内存使用量,尤其是在使用这么多文件的情况下。

using (var reader = File.OpenText(originalFile))
using (var writer = File.CreateText(tempFile))
{
    string line;
    while ((line = reader.ReadLine()) != null) 
    {
        var temp = DoMyStuff(line);
        writer.WriteLine(temp);
    }
}

File.Delete(originalFile);
File.Move(tempFile, originalFile);

Based on the size of the files, I would use File.ReadAllText to read them and File.WriteAllText to wirte them. 根据文件的大小,我会使用File.ReadAllText来读取它们,使用File.WriteAllText来创建它们。 This frees you up from the responsibility of having to call Close or Dispose on either read or write. 这使您无需在读取或写入时调用CloseDispose

You generally don't want to read a text file on a binary level - just use File.ReadAllText() and supply it with the correct encoding used in the file (there's an overload for that). 您通常不希望在二进制级别上读取文本文件 - 只需使用File.ReadAllText()并为其提供文件中使用的正确编码(存在重载 )。 If the file encoding is UTF8 or UTF32 usually the method can automatically detect and use the correct endcoding. 如果文件编码通常是UTF8或UTF32,则该方法可以自动检测并使用正确的结束编码。 Same applies to writing it back - if it's not UTF8 specify which encoding you want. 同样适用于将其写回 - 如果不是UTF8指定您想要的编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM