简体   繁体   中英

How to convert text files to binary in an efficient way in C#

I have checked several methods for converting text files to binary and found some answers here as well. However, most of them confused me due to Unity .NET compatibility and I am also confused about the structure of how I convert text to binary.

I have a text file (exported point cloud) which holds positions of points in 3D space and color information like this:

XYZ colorvalues -0.680891 -90.6809 0 204 204 204 255

I was reading this to create meshes in run time with a script like this:

 string[] buffer;

    for (int i = 0; i < Area.nPoints; i++)
    {
        buffer = sr.ReadLine().Split();

        Area.AddPoint(new Vector3(float.Parse(buffer[0]), 
        float.Parse(buffer[1]), float.Parse(buffer[2])));
    }

This works but since I read line and split them it is quite slow and I have around 75 million lines(Points) in my text file. I found out that I can convert it to binary and reading would be faster which I did and it was a lot faster. However, now converting to binary part is quite slow I wanted to ask you about the way I converted.

void WriteValues()
{
    string[] buffer;

    for (int i = 0; i < numPoints; i++)
    {
        buffer = sr.ReadLine().Split();
        for (int j = 0; i < 3; i++)
        {
            wr.Write(float.Parse(buffer[j]));
        }           
    }        
    wr.Close();
}

Then I read it with BinaryReader.ReadSingle() but this takes a lot more time than reading directly from the text because I again read the line and split it.

My question is could I read lets say next 1000 lines buffer it and then write instead of reading every line? Would it make a difference. If so how can I use stream once for every 1000 lines.

Also when I converted a line to binary how can I read every float in the line without splitting the string? Thanks in advance for any help!

I am trying to do this for visualizing a point cloud in my mobile phone using Augmented Reality. So I want to do the scan, export the point cloud, import it to Unity and create a mesh by using those points without triangulating but with my initial approach it take 15-18 minutes to import it. After converting to binary it takes less than 3 minutes which is okay. However, converting to binary takes a lot of time this time :)

So a reasonably quick way to read is with a buffered file stream. Without the float parsing, the reading takes 14 ish seconds on my machine.... 74 seconds ish with float parsing ( I just summed since I don't have unity to play with )

var sw = new Stopwatch();
sw.Start();
double sum = 0;
var fs = new FileStream("demo.txt", FileMode.Open, FileAccess.Read);
using (var bs = new BufferedStream(fs))
using (var r = new StreamReader(bs))
{
    r.ReadLine();
    while (!r.EndOfStream)
    {
        var l = r.ReadLine();
        var split = l.Split();
        var x = float.Parse(split[0]);
        var y = float.Parse(split[1]);
        var z=float.Parse(split[2]);
        sum += x + y + z;
    }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds / 1000M);
Console.WriteLine(sum);

out of interest I also changed the code to write the data out as a stream of floats ( in triplets)

read in with

var sw = new Stopwatch();
sw.Start();
double sum = 0;
var fs = new FileStream("demo.bin", FileMode.Open, FileAccess.Read);
using (var bs = new BufferedStream(fs))
using (var r = new BinaryReader(bs))
{
    for (int i = 0; i < 75000000; i++)
    {
        var x = r.ReadSingle();
        var y = r.ReadSingle();
        var z=r.ReadSingle();
        sum += x + y + z;
    }
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds / 1000M);
Console.WriteLine(sum);

takes ~ 9 seconds

just for completeness, I used the following code to generate demo files..

   var random = new Random();
    File.WriteAllText("demo.txt", "X         Y        Z colorvalues\r\n");
    using (var fs = new FileStream("demo.bin", FileMode.Create, FileAccess.Write, FileShare.None))
    using (var bw = new BinaryWriter(fs))
    using (var writer = File.AppendText("demo.txt"))
    {
        for (int i = 0; i < 75000000; i++)
        {
            var x = (float) random.NextDouble() * 200;
            var y = (float) random.NextDouble() * 200;
            var z = (float) random.NextDouble() * 200;
            var c = Enumerable.Range(0, 4).Select(n => random.Next(0, 255)).ToArray();
            writer.WriteLine($"{x} {y} {z} {c[0]} {c[1]} {c[2]} {c[3]}");
            bw.Write(x);
            bw.Write(y);
            bw.Write(z);
        }
}

That might be silly question but why don't you scan & save directly into binary or .ply file? Or even scan & save into mesh or some voxelized-style mesh

You may also look up the approach used in this project, especially PlyImporter.cs

If reading is slow, then reading, writing to a different file format and then reading back from that file is going to be even slower. You are just adding more actions to something that is already slow... Maybe you should look at how to change the way you do the reading from the text file.

If you are not familiar with how serialization/deserialization is done in C#, using the built in libraries, you should start by reading this: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/serialization/

Here is a link to show how to implement binary serialization: https://docs.microsoft.com/en-us/dotnet/api/system.runtime.serialization.formatters.binary.binaryformatter?view=netframework-4.7.2

However if you are not writing the initial file, you just need to write a custom deserializer (which is essentially what you have done - without implementing the relevant .NET patterns). Maybe try using a BufferedStream and see whether that help ie.:

using (FileStream fs = File.Open(fileName, ..... ))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
        string s;
        while ((s = sr.ReadLine()) != null)
        {
            //your code   
        }
}

Also it is worth having a look at this library which can help you with this task: FileHelpers - Look at this example: https://www.filehelpers.net/example/QuickStart/ReadFileDelimited/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM