简体   繁体   English

在C ++中编辑巨大文件的第一行

[英]Edit the first line of a huge file in c++

Is there any "fast" way to edit the first line of a big file(~100Mg) in C++? 是否有任何“快速”方式在C ++中编辑大文件(〜100Mg)的第一行?

I know we can read the file line by line, make changes, write it to a temporary file, and rename the temporary file. 我知道我们可以逐行读取文件,进行更改,将其写入临时文件并重命名该临时文件。 But, I am wondering if there is a faster way of doing this (something like in-place modification)? 但是,我想知道是否有一种更快的方法(类似于就地修改)?

You can probably use the fwrite / fprintf file manipulation methods to be able to write to the file depending on the file pointer's position. 您可能可以使用fwrite / fprintf文件操作方法来写文件,具体取决于文件指针的位置。

You open the file with fopen for appending, use fseek to the beginning and write what you need. 您使用fopen打开文件进行追加,以fseek开头并编写所需内容。 However, you should be careful with the length of the first line. 但是,您应该注意第一行的长度。 If you write less than the original line you will still have that extra content left over. 如果您写的内容少于原始行,您仍然会剩下多余的内容。 If you write more you will overwrite your other content. 如果您编写更多内容,则将覆盖您的其他内容。

100MB is not that big on modern computers. 在现代计算机上,100MB并不大。 If this is a one time deal and you're not working on a really slow device, you can simply read the whole file, split it into lines, make your edit and write it all back in a moment. 如果这是一次交易,并且您不是在运行速度非常慢的设备上工作,则可以简单地读取整个文件,将其拆分为几行,然后进行编辑并将其全部写回。

If this is something that's going to happen more often, you could benefit from simply adding some whitespace padding to the first line (if possible) to create a "buffer" for things that you can put there the next time. 如果这是经常发生的事情,您可以从简单地在第一行添加一些空白填充(如果可能)中受益,以为您下次可以放置的内容创建一个“缓冲区”,从而从中受益。 Then you can use fwrite to overwrite just that first line, without touching the rest of the file. 然后,您可以使用fwrite覆盖第一行,而无需触摸文件的其余部分。

There may be OS and filesystem specific ways to allocate additional space inside an existing file without moving the data. 可能存在特定于OS和文件系统的方式,可以在不移动数据的情况下在现有文件内分配更多空间。 For example on Linux with XFS/ext4 you can use fallocate : 例如,在具有XFS / ext4的Linux上,您可以使用fallocate

int fallocate(int fd, int mode, off_t offset, off_t len);

fallocate() allows the caller to directly manipulate the allocated disk space for the file referred to by fd for the byte range starting at offset and continuing for len bytes. fallocate()允许调用方直接为fd引用的文件操作文件分配的磁盘空间,该文件空间的字节范围从offset开始,直到len个字节。

I believe the fastest way to accomplish your task is to create a new file that contains the first line value. 我相信完成任务的最快方法是创建一个包含第一行值的新文件。 Whenever you take a request to read the file, you read the first line value file first, then read the larger file, skipping over the first line that is actually stored with the larger file. 每当您请求读取文件时,都会先读取第一个行值文件,然后读取较大的文件,而跳过实际与较大文件一起存储的第一行。 Whenever you want to change the first line, just change the first line file. 每当您想更改第一行时,只需更改第一行文件即可。

You're thinking of a memory-mapped file , in which the entire file is "mapped" into memory but not actually loaded or rewritten until you attempt to access or modify a part of it. 您正在考虑一个内存映射文件 ,其中整个文件都被“映射”到内存中,但是直到您尝试访问或修改它的一部分时才真正加载或重写。 On POSIX systems, you can mmap() a part of a file (say, the first kilobyte), modify it as necessary, then use msync() to write just that chunk of memory back to the disk. 在POSIX系统上,您可以mmap()文件的一部分(例如第一个千字节),根据需要对其进行修改,然后使用msync()将仅那部分内存写回到磁盘上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM