简体   繁体   中英

Read binary objects from a file in C# written out by a C++ program

I am trying to read objects from very large files containing padded structs that were written into it by a C++ process. I was using an example to memory map the large file and try to deserialize the data into an object but I now can see that it won't work this way.

How can I extract all the objects from the files to use in C#? I'm probably way off but I've provided the code. The objects have a 8 byte milliseconds member followed by 21 16bit integers, which needs 6bytes of padding to align to a 8byte boundary.

[Serializable]
unsafe public struct DataStruct
{
    public UInt64 milliseconds;
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 21)]
    public fixed Int16 data[21];
    [MarshalAs(UnmanagedType.ByValArray, SizeConst = 3)]
    public fixed Int16 padding[3];

};

[Serializable]
public class DataArray
{
    public DataStruct[] samples;
}

public static class Helper
{
    public static Int16[] GetData(this DataStruct data)
    {
        unsafe
        {
            Int16[] output = new Int16[21];
            for (int index = 0; index < 21; ++index)
                output[index] = data.data[index];
            return output;
        }
    }
}

class FileThreadSupport
{
    struct DataFileInfo
    {
        public string path;
        public UInt64 start;
        public UInt64 stop;
        public UInt64 elements;
    };

    // Create our epoch timestamp
    private static readonly DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);

    // Output TCP client
    private Support.AsyncTcpClient output;

    // Directory which contains our data
    private string replay_directory;

    // Files to be read from
    private DataFileInfo[] file_infos;

    // Current timestamp of when the process was started
    UInt64 process_start = 0;

    // Object from current file
    DataArray current_file_data;

    // Offset into current files
    UInt64 current_file_index = 0;

    // Offset into current files
    UInt64 current_file_offset = 0;

    // Run flag
    bool run = true;

    public FileThreadSupport(ref Support.AsyncTcpClient output, ref Engine.A.Information info, ref Support.Configuration configuration)
    {
        // Set our output directory
        replay_directory = configuration.getString("replay_directory");
        if (replay_directory.Length == 0)
        {
            Console.WriteLine("Configuration does not provide a replay directory");
            return;
        }

        // Check the directory for playable files
        if(!loadDataDirectory(replay_directory))
        {
            Console.WriteLine("Replay directory {} did not have any valid files", replay_directory);
        }

        // Set the output TCP client
        this.output = output;
    }

    private bool loadDataDirectory(string directory)
    {
        string[] files = Directory.GetFiles(directory, "*.*", SearchOption.TopDirectoryOnly);
        file_infos = new DataFileInfo[files.Length];
        int index = 0;
        foreach (string file in files)
        {
            string[] parts = file.Split('\\');
            string name = parts.Last();
            parts = name.Split('.');
            if (parts.Length != 2)
                continue;
            UInt64 start, stop = 0;
            if (!UInt64.TryParse(parts[0], out start) || !UInt64.TryParse(parts[1], out stop))
                continue;

            long size = new System.IO.FileInfo(file).Length;

            // Add to our file info array
            file_infos[index] = new DataFileInfo
            {
                path = file,
                start = start,
                stop = stop,
                elements = (ulong)(new System.IO.FileInfo(file).Length / 56 
                /*System.Runtime.InteropServices.Marshal.SizeOf(typeof(DataStruct))*/)
            };
            ++index;
        }

        // Sort the array
        Array.Sort(file_infos, delegate (DataFileInfo x, DataFileInfo y) { return x.start.CompareTo(y.start); });

        // Return whether or not there were files found
        return (files.Length > 0);
    }

    public void start()
    {
        process_start = (ulong)DateTime.Now.ToUniversalTime().Subtract(epoch).TotalMilliseconds;
        UInt64 num_samples = 0;

        while(run)
        {
            // Get our samples and add it to the sample
            DataStruct[] result = getData(100);
            Engine.A.A message = new Engine.A.A();
            for (int i = 0; i < result.Length; ++i)
            {
                Engine.A.Data sample = new Engine.A.Data();
                sample.Time = process_start + num_samples * 4;
                Int16[] signal_data = Helper.GetData(result[i]);
                for(int e = 0; e < signal_data.Length; ++e)
                    sample.Value[e] = signal_data[e];
                message.Signal.Add(sample);
                ++num_samples;
            }

            // Send out the websocket
            this.output.SendAsync(message.ToByteArray());

            // Sleep 100 milliseconds
            Thread.Sleep(100);
        }
    }

    public void stop()
    {
        run = false;
    }

    private DataStruct[] getData(UInt64 milliseconds)
    {
        if (file_infos.Length == 0)
            return new DataStruct[0];

        if (current_file_data == null)
        {
            current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
            if(current_file_data.samples.Length == 0)
                return new DataStruct[0];
        }

        UInt64 elements_to_read = (UInt64) milliseconds / 4;
        DataStruct[] result = new DataStruct[elements_to_read];
        Array.Copy(current_file_data.samples, (int)current_file_offset, result, 0, (int) Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
        while((UInt64) result.Length != elements_to_read)
        {
            current_file_index = (current_file_index + 1) % (ulong) file_infos.Length;
            current_file_data = ReadObjectFromMMF(file_infos[current_file_index].path) as DataArray;
            if (current_file_data.samples.Length == 0)
                return new DataStruct[0];
            current_file_offset = 0;
            Array.Copy(current_file_data.samples, (int)current_file_offset, result, result.Length, (int)Math.Min(elements_to_read, file_infos[current_file_index].elements - current_file_offset));
        }
        return result;
    }

    private object ByteArrayToObject(byte[] buffer)
    {
        BinaryFormatter binaryFormatter = new BinaryFormatter(); // Create new BinaryFormatter
        MemoryStream memoryStream = new MemoryStream(buffer);    // Convert buffer to memorystream
        return binaryFormatter.Deserialize(memoryStream);        // Deserialize stream to an object
    }

    private object ReadObjectFromMMF(string file)
    {
        // Get a handle to an existing memory mapped file
        using (MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(file, FileMode.Open))
        {
            // Create a view accessor from which to read the data
            using (MemoryMappedViewAccessor mmfReader = mmf.CreateViewAccessor())
            {
                // Create a data buffer and read entire MMF view into buffer
                byte[] buffer = new byte[mmfReader.Capacity];
                mmfReader.ReadArray<byte>(0, buffer, 0, buffer.Length);

                // Convert the buffer to a .NET object
                return ByteArrayToObject(buffer);
            }
        }
    }

Well for one thing you're not using that memory mapped file well at all, you're just sequentially reading it all in a buffer, which is both needlessly inefficient and much slower than if you simply opened the file to read normally. The selling point of memory mapped files is repeated random access and random updates backed by the OS's virtual memory paging.

And you definitely don't need to read the entire file in memory, since your data is so strongly structured. You know exactly how many bytes to read for a record: Marshal.SizeOf<DataStruct>() .

Then you need to get rid of all that serialization noise. Again your data is strongly typed, just read it . Get rid of those fixed arrays and use regular arrays, you're already instructing the marshaller how to read them with MarshalAs attributes (good). That also gets rid of that helper function that just copies an array for some unknown reason.

Your reading loop is very simple: read the correct number of bytes for one entry, use Marshal.PtrToStructure to convert it to a readable structure and add it to a list to return at the end. Bonus points if you can use.Net Core and Unsafe.As or Unsafe.Cast .

Edit: and don't use object returns, you know exactly what you're returning, write it down.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM