简体   繁体   中英

Directly reading large binary file in C# w/out copying

I am looking for the most efficient/direct way to do this simple C/C++ operation:

void ReadData(FILE *f, uint16 *buf, int startsamp, int nsamps)
{
   fseek(f, startsamp*sizeof(uint16), SEEK_SET);
   fread(buf, sizeof(uint16), nsamps, f);
}

in C#/.NET. (I'm ignoring return values for clarity - production code would check them.) Specifically, I need to read in many (potentially 10's to 100's of millions) 2-byte (16-bit) "ushort" integer data samples (fixed format, no parsing required) stored in binary in a disk file. The nice thing about the C way is that it reads the samples directly into the "uint16 *" buffer with no CPU involvement, and no copying. Yes, it is potentially "unsafe", as it uses void * pointers to buffers of unknown size, but it seems like there should be a "safe" .NET alternative.

What is the best way to accomplish this in C#? I have looked around, and come across a few hints ("unions" using FieldOffset, "unsafe" code using pointers, Marshalling), but none seem to quite work for this situation, w/out using some sort of copying/conversion. I'd like to avoid BinaryReader.ReadUInt16(), since that is very slow and CPU intensive. On my machine there is about a 25x difference in speed between a for() loop with ReadUInt16(), and reading the bytes directly into a byte[] array with a single Read(). That ratio could be even higher with non-blocking I/O (overlapping "useful" processing while waiting for the disk I/O).

Ideally, I would want to simply "disguise" a ushort[] array as a byte[] array so I could fill it directly with Read(), or somehow have Read() fill the ushort[] array directly:

// DOES NOT WORK!!
public void GetData(FileStream f, ushort [] buf, int startsamp, int nsamps)
{
    f.Position = startsamp*sizeof(ushort);
    f.Read(buf, 0, nsamps);
}

But there is no Read() method that takes a ushort[] array, only a byte[] array.

Can this be done directly in C#, or do I need to use unmanaged code, or a third-party library, or must I resort to CPU-intensive sample-by-sample conversion? Although "safe" is preferred, I am fine with using "unsafe" code, or some trick with Marshal, I just have not figured it out yet.

Thanks for any guidance!


[UPDATE]

I wanted to add some code as suggested by dtb, as there seem to be precious few examples of ReadArray around. This is a very simple one, w/no error checking shown.

public void ReadMap(string fname, short [] data, int startsamp, int nsamps)
{
    var mmf = MemoryMappedFile.CreateFromFile(fname);
    var mmacc = mmf.CreateViewAccessor();

    mmacc.ReadArray(startsamp*sizeof(short), data, 0, nsamps);
}

Data is safely dumped into your passed array. You can also specify a type for more complex types. It seems able to infer simple types on its own, but with the type specifier, it would look like this:

    mmacc.ReadArray<short>(startsamp*sizeof(short), data, 0, nsamps);

[UPATE2]

I wanted to add the code as suggested by Ben's winning answer, in "bare bones" form, similar to above, for comparison. This code was compiled and tested, and works, and is FAST. I used the SafeFileHandle type directly in the DllImport (instead of the more usual IntPtr) to simplify things.

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool ReadFile(SafeFileHandle handle, IntPtr buffer, uint numBytesToRead, out uint numBytesRead, IntPtr overlapped);

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool SetFilePointerEx(SafeFileHandle hFile, long liDistanceToMove, out long lpNewFilePointer, uint dwMoveMethod);

unsafe void ReadPINV(FileStream f, short[] buffer, int startsamp, int nsamps)
{
    long unused; uint BytesRead;
    SafeFileHandle nativeHandle = f.SafeFileHandle; // clears Position property
    SetFilePointerEx(nativeHandle, startsamp*sizeof(short), out unused, 0);

    fixed(short* pFirst = &buffer[0])
        ReadFile(nativeHandle, (IntPtr)pFirst, (uint)nsamps*sizeof(short), out BytesRead, IntPtr.Zero);
}

You can use a MemoryMappedFile . After you have memory-mapped the file, you can create a view (ie a MemoryMappedViewAccessor ) which provides a ReadArray<T> method. This method can read structs from the file without marshalling, and it works with primitive types lie ushort .

dtb's answer is an even better way (actually, it has to copy the data as well, no gain there), but I just wanted to point out that to extract ushort values from a byte array you should be using BitConverter not BinaryReader

EDIT: example code for p/invoking ReadFile :

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool ReadFile(IntPtr handle, IntPtr buffer, uint numBytesToRead, out uint numBytesRead, IntPtr overlapped);

[DllImport("kernel32.dll", SetLastError=true)]
[return:MarshalAs(UnmanagedType.Bool)]
static extern bool SetFilePointerEx(IntPtr hFile, long liDistanceToMove, out long lpNewFilePointer, uint dwMoveMethod);

unsafe bool read(FileStream fs, ushort[] buffer, int offset, int count)
{
  if (null == fs) throw new ArgumentNullException();
  if (null == buffer) throw new ArgumentNullException();
  if (offset < 0 || count < 0 || offset + count > buffer.Length) throw new ArgumentException();
  uint bytesToRead = 2 * count;
  if (bytesToRead < count) throw new ArgumentException(); // detect integer overflow
  long offset = fs.Position;
  SafeFileHandle nativeHandle = fs.SafeFileHandle; // clears Position property
  try {
    long unused;
    if (!SetFilePositionEx(nativeHandle, offset, out unused, 0);
    fixed (ushort* pFirst = &buffer[offset])
      if (!ReadFile(nativeHandle, new IntPtr(pFirst), bytesToRead, out bytesToRead, IntPtr.Zero)
        return false;
    if (bytesToRead < 2 * count)
      return false;
    offset += bytesToRead;
    return true;
  }
  finally {
    fs.Position = offset; // restore Position property
  }
}

I might be a bit late to the game here... but the fastest method I found was using a combination of the previous answers.

If i do the following:

MemoryMappedFile mmf = MemoryMappedFile.CreateFromFile(somePath);
Stream io = mmf.CreateViewStream();

int count;
byte[] byteBuffer = new byte[1024 << 2];
ushort[] dataBuffer = new ushort[buffer.Length >> 1];

while((count = io.Read(byteBuffer, 0, byteBuffer.Length)) > 0)
  Buffer.BlockCopy(buffer, 0, dataBuffer, 0, count);

This was ~2x faster than the accepted answer.

For me, the unsafe method was the same as the Buffer.BlockCopy without the MemoryMappedFile . The MemoryMappedFile cut down on a bit of time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM