简体   繁体   English

如何在C#中以有效方式将文本文件转换为二进制文件

[英]How to convert text files to binary in an efficient way in C#

I have checked several methods for converting text files to binary and found some answers here as well. 我检查了几种将文本文件转换为二进制文件的方法,并在此处找到了一些答案。 However, most of them confused me due to Unity .NET compatibility and I am also confused about the structure of how I convert text to binary. 但是,由于Unity .NET的兼容性,他们中的大多数使我感到困惑,而且我对如何将文本转换为二进制的结构也感到困惑。

I have a text file (exported point cloud) which holds positions of points in 3D space and color information like this: 我有一个文本文件(导出的点云),该文件保存3D空间中点的位置和颜色信息,如下所示:

XYZ colorvalues -0.680891 -90.6809 0 204 204 204 255

I was reading this to create meshes in run time with a script like this: 我正在阅读以下内容,以便在运行时使用如下脚本创建网格:

 string[] buffer;

    for (int i = 0; i < Area.nPoints; i++)
    {
        buffer = sr.ReadLine().Split();

        Area.AddPoint(new Vector3(float.Parse(buffer[0]), 
        float.Parse(buffer[1]), float.Parse(buffer[2])));
    }

This works but since I read line and split them it is quite slow and I have around 75 million lines(Points) in my text file. 这行得通,但是由于我读取行并拆分了它们,这非常慢,并且我的文本文件中大约有7500万行(点)。 I found out that I can convert it to binary and reading would be faster which I did and it was a lot faster. 我发现我可以将其转换为二进制文件,并且读取速度会更快,而且速度也快得多。 However, now converting to binary part is quite slow I wanted to ask you about the way I converted. 但是,现在转换为二进制部分非常慢,我想问一下我的转换方式。

void WriteValues()
{
    string[] buffer;

    for (int i = 0; i < numPoints; i++)
    {
        buffer = sr.ReadLine().Split();
        for (int j = 0; i < 3; i++)
        {
            wr.Write(float.Parse(buffer[j]));
        }           
    }        
    wr.Close();
}

Then I read it with BinaryReader.ReadSingle() but this takes a lot more time than reading directly from the text because I again read the line and split it. 然后,我使用BinaryReader.ReadSingle()进行了读取,但是这比直接从文本中读取要花费更多的时间,因为我再次读取了该行并将其拆分。

My question is could I read lets say next 1000 lines buffer it and then write instead of reading every line? 我的问题是我可以说让接下来的1000行缓冲然后写入而不是读取每一行吗? Would it make a difference. 会有所作为吗? If so how can I use stream once for every 1000 lines. 如果是这样,我如何每1000行使用一次stream。

Also when I converted a line to binary how can I read every float in the line without splitting the string? 另外,当我将一行转换为二进制时,如何在不分割字符串的情况下读取行中的每个浮点数? Thanks in advance for any help! 在此先感谢您的帮助!

I am trying to do this for visualizing a point cloud in my mobile phone using Augmented Reality. 我正在尝试使用增强现实技术来可视化手机中的点云。 So I want to do the scan, export the point cloud, import it to Unity and create a mesh by using those points without triangulating but with my initial approach it take 15-18 minutes to import it. 因此,我想进行扫描,将点云导出,将其导入Unity并使用这些点创建网格,而无需进行三角剖分,但是使用我的初始方法,导入它需要15-18分钟。 After converting to binary it takes less than 3 minutes which is okay. 转换为二进制文件后,只需不到3分钟的时间就可以了。 However, converting to binary takes a lot of time this time :) 但是,这次转换为二进制文件需要很多时间:)

So a reasonably quick way to read is with a buffered file stream. 因此,一种合理的快速读取方法是使用缓冲的文件流。 Without the float parsing, the reading takes 14 ish seconds on my machine.... 74 seconds ish with float parsing ( I just summed since I don't have unity to play with ) 如果不进行浮点解析,则在我的计算机上读取将花费14 ish秒。...进行浮点解析需要74秒ish(我只是总结一下,因为我没有团结精神)

var sw = new Stopwatch();
sw.Start();
double sum = 0;
var fs = new FileStream("demo.txt", FileMode.Open, FileAccess.Read);
using (var bs = new BufferedStream(fs))
using (var r = new StreamReader(bs))
{
    r.ReadLine();
    while (!r.EndOfStream)
    {
        var l = r.ReadLine();
        var split = l.Split();
        var x = float.Parse(split[0]);
        var y = float.Parse(split[1]);
        var z=float.Parse(split[2]);
        sum += x + y + z;
    }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds / 1000M);
Console.WriteLine(sum);

out of interest I also changed the code to write the data out as a stream of floats ( in triplets) 出于兴趣,我还更改了代码以将数据写成浮点流(在三胞胎中)

read in with 阅读与

var sw = new Stopwatch();
sw.Start();
double sum = 0;
var fs = new FileStream("demo.bin", FileMode.Open, FileAccess.Read);
using (var bs = new BufferedStream(fs))
using (var r = new BinaryReader(bs))
{
    for (int i = 0; i < 75000000; i++)
    {
        var x = r.ReadSingle();
        var y = r.ReadSingle();
        var z=r.ReadSingle();
        sum += x + y + z;
    }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds / 1000M);
Console.WriteLine(sum);

takes ~ 9 seconds 大约需要9秒

just for completeness, I used the following code to generate demo files.. 仅出于完整性考虑,我使用以下代码生成演示文件。

   var random = new Random();
    File.WriteAllText("demo.txt", "X         Y        Z colorvalues\r\n");
    using (var fs = new FileStream("demo.bin", FileMode.Create, FileAccess.Write, FileShare.None))
    using (var bw = new BinaryWriter(fs))
    using (var writer = File.AppendText("demo.txt"))
    {
        for (int i = 0; i < 75000000; i++)
        {
            var x = (float) random.NextDouble() * 200;
            var y = (float) random.NextDouble() * 200;
            var z = (float) random.NextDouble() * 200;
            var c = Enumerable.Range(0, 4).Select(n => random.Next(0, 255)).ToArray();
            writer.WriteLine($"{x} {y} {z} {c[0]} {c[1]} {c[2]} {c[3]}");
            bw.Write(x);
            bw.Write(y);
            bw.Write(z);
        }
}

That might be silly question but why don't you scan & save directly into binary or .ply file? 那可能是个愚蠢的问题,但是为什么不扫描并直接保存到二进制或.ply文件中呢? Or even scan & save into mesh or some voxelized-style mesh 甚至扫描并保存到网格或某些体素化网格中

You may also look up the approach used in this project, especially PlyImporter.cs 您还可以查找项目中使用的方法,尤其是PlyImporter.cs

If reading is slow, then reading, writing to a different file format and then reading back from that file is going to be even slower. 如果读取速度很慢,则读取,写入其他文件格式然后从该文件读回的速度甚至会更慢。 You are just adding more actions to something that is already slow... Maybe you should look at how to change the way you do the reading from the text file. 您只是向已经很慢的事物添加了更多操作...也许您应该看看如何更改从文本文件读取内容的方式。

If you are not familiar with how serialization/deserialization is done in C#, using the built in libraries, you should start by reading this: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/serialization/ 如果您不熟悉如何使用内置库在C#中完成序列化/反序列化,则应先阅读以下内容: https : //docs.microsoft.com/zh-cn/dotnet/csharp/programming-guide/概念/序列化/

Here is a link to show how to implement binary serialization: https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.formatters.binary.binaryformatter?view=netframework-4.7.2 以下是显示如何实现二进制序列化的链接: https : //docs.microsoft.com/zh-cn/dotnet/api/system.runtime.serialization.formatters.binary.binaryformatter?view=netframework-4.7.2

However if you are not writing the initial file, you just need to write a custom deserializer (which is essentially what you have done - without implementing the relevant .NET patterns). 但是,如果您不写初始文件,则只需要编写一个自定义反序列化器(本质上就是您要做的-不实现相关的.NET模式)。 Maybe try using a BufferedStream and see whether that help ie.: 也许尝试使用BufferedStream ,看看是否有帮助,即:

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
            //your code   
        }
}

Also it is worth having a look at this library which can help you with this task: FileHelpers - Look at this example: https://www.filehelpers.net/example/QuickStart/ReadFileDelimited/ 同样值得一看的是这个可以帮助您完成此任务的库: FileHelpers-查看以下示例: https : //www.filehelpers.net/example/QuickStart/ReadFileDelimited/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM