简体   繁体   中英

Reading from and writing to the middle of a binary file in C/C++

If I have a large binary file (say it has 100,000,000 floats), is there a way in C (or C++) to open the file and read a specific float, without having to load the whole file into memory (ie how can I quickly find what the 62,821,214th float is)? A second question, is there a way to change that specific float in the file without having to rewrite the entire file?

I'm envisioning functions like:

float readFloatFromFile(const char* fileName, int idx) {
    FILE* f = fopen(fileName,"rb");

    // What goes here?
}

void writeFloatToFile(const char* fileName, int idx, float f) {
    // How do I open the file? fopen can only append or start a new file, right?

    // What goes here?
}

You know the size of a float is sizeof(float) , so multiplication can get you to the correct position:

FILE *f = fopen(fileName, "rb");
fseek(f, idx * sizeof(float), SEEK_SET);
float result;
fread(&result, sizeof(float), 1, f);

Similarly, you can write to a specific position using this method.

fopen allows to open a file for modification (not just to append) by using either the rb+ or wb+ mode on fopen . See here: http://www.cplusplus.com/reference/clibrary/cstdio/fopen/

To position the file to a specific float, you can use the fseek by using index*sizeof(float) as the offset ad SEEK_SET as the orign. See here: http://www.cplusplus.com/reference/clibrary/cstdio/fseek/

Here is an example if you would like to use C++ streams:

#include <fstream>
using namespace std;

int main()
{
    fstream file("floats.bin", ios::binary);
    float number;

    file.seekp(62821214*sizeof(float), ios::beg);
    file.read(reinterpret_cast<char*>(&number), sizeof(float));
    file.seekp(0, ios::beg); // move to the beginning of the file
    number = 3.2;
    // write number at the beginning of the file
    file.write(reinterpret_cast<char*>(&number), sizeof(float));
}

One way would be to call mmap() on the file. Once you've done that, you can read/modify the file as if it was an in-memory array.

Of course that method only works if the file is small enough to fit in your process's address space... if you're running in 64-bit mode, you'll be fine; in 32-bit mode, a file with 100,000,000 floats should fit, but another order or two of magnitude above that and you might run into trouble.

I know this question has been answered already, but Linux/Unix provides easy system calls to read/write(pread/pwrite) in the middle of a file. If you look at the kernel source code for the system calls 'read' & 'pread', both eventually calls the vfs_read().And vfs_read requires a OFFSET, ie it requires a POSITION to read from the file. In pread,this offset is given by us and in read() the offset is calculated internally in the kernel and maintained for the file descriptor. pread() offers exceptional performance compared to read() and using pread ,you can read/write in the same file descriptor simultaneously in multiple threads in different parts of the file. My Humble opionion, never use read() or other file streams, use pread(). Hope the filestream libraries have wrapped the read() calls, the streams perform well by making fewer system calls.

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
int main()
{
  char* buf; off_t offToStart = id * sizeof(float); size_t sizeToRead = sizeof(float);
  int fd = open("fileName", O_RDONLY);
  ret = pread(fd, buf, sizeToRead, offToStart);
  //processs from the read 'buf'
  close(fd);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM