简体   繁体   English

RandomAccessFile.seek()如何工作?

[英]How does RandomAccessFile.seek() work?

As per the API , these are the facts: 根据API ,这些是事实:

  • The seek(long bytePosition) method simply put, moves the pointer to the position specified with the bytePosition parameter. seek(long bytePosition)方法简单地将指针移动到使用bytePosition参数指定的位置。
  • When the bytePosition is greater than the file length, the file length does not change unless a byte is written at the (new) end. bytePosition大于文件长度时,除非在(新)端写入一个字节,否则文件长度不会改变。
  • If data is present in the length skipped over, such data is left untouched. 如果跳过的长度中存在数据,则保持此类数据不变。

However, the situation I'm curious about is: When there is a file with no data (0 bytes) and I execute the following code: 但是,我很好奇的情况是:当有一个没有数据的文件(0字节)时,我执行以下代码:

file.seek(100000-1);
file.write(0);

All the 100,000 bytes are filled with 0 almost instantly. 所有100,000个字节几乎立即填充0 I can clock over 200GB in say, 10 ms. 比如10毫秒,我可以超过200GB。

But when I try to write 100000 bytes using other methods such as BufferedOutputStream the same process takes an almost infinitely longer time. 但是当我尝试使用其他方法(如BufferedOutputStream写入100000字节时,相同的过程需要几乎无限长的时间。

What is the reason for this difference in time? 造成这种差异的原因是什么? Is there a more efficient way to create a file of n bytes and fill it with 0 s? 有没有更有效的方法来创建n个字节的文件并用0 s填充它?

EDIT: If the data is not actually written, how is the file filled with data? 编辑:如果数据没有实际写入,文件如何填充数据? Sample this code: 请试用此代码:

RandomAccessFile out=new RandomAccessFile("D:/out","rw");
out.seek(100000-1);
out.write(0);
out.close();

This is the output: 这是输出:

产量

Plus, If the file is huge enough I can no longer write to the disk due to lack of space. 另外,如果文件足够大,由于空间不足,我无法再写入磁盘。

When you write 100,000 bytes to a BufferedOutputStream , your program is explicitly accessing each byte of the file and writing a zero. 当您向BufferedOutputStream写入100,000个字节时,您的程序将显式访问文件的每个字节并写入零。

When you use a RandomAccessFile.seek() on a local file, you are indirectly using the C system call fseek() . 在本地文件上使用RandomAccessFile.seek()时,您间接使用C系统调用fseek() How that gets handled depends on the operating system. 如何处理取决于操作系统。

In most modern operating systems, sparse files are supported. 在大多数现代操作系统中,支持稀疏文件 This means that if you ask for an empty 100,000 byte file, 100,000 bytes of disk space are not actually used. 这意味着如果要求空的100,000字节文件,实际上不会使用100,000字节的磁盘空间。 When you write to byte 100,001, the OS still doesn't use 100,001 bytes of disk. 当您写入字节100,001时,操作系统仍然不使用100,001字节的磁盘。 It allocates a small amount of space for the block containing "real" data, and keeps track of the empty space separately. 它为包含“真实”数据的块分配少量空间,并分别跟踪空白空间。

When you read a sparse file, for example, by fseek() ing to byte 50,000, then reading, the OS can say "OK, I have not allocated disk space for byte 50,000 because I have noted that bytes 0 to 100,000 are empty. Therefore I can return 0 for this byte.". 当您读取稀疏文件时,例如,通过fseek()到字节50,000,然后读取,操作系统可以说“好吧,我没有为字节50,000分配磁盘空间,因为我已经注意到字节0到100,000是空的。因此,我可以为此字节返回0 This is invisible to the caller. 这对调用者是不可见的。

This has the dual purpose of saving disk space, and improving speed. 这具有节省磁盘空间和提高速度的双重目的。 You have noticed the speed improvement. 你注意到速度的提高。

More generally, fseek() goes directly to a position in a file, so it's O(1) rather than O(n). 更一般地说, fseek()直接转到文件中的某个位置,因此它是O(1)而不是O(n)。 If you compare a file to an array, it's like doing x = arr[n] instead of for(i = 0; i<=n; i++) { x = arr[i]; } 如果将文件与数组进行比较,则就像执行x = arr[n]而不是for(i = 0; i<=n; i++) { x = arr[i]; } for(i = 0; i<=n; i++) { x = arr[i]; }

This description, and that on Wikipedia, is probably sufficient to understand why seeking to byte 100,000 then writing is faster than writing 100,000 zeros. 这个描述和Wikipedia上的描述可能足以理解为什么寻求字节100,000然后写入比写入100,000个零更快。 However you can read the Linux kernel source code to see how sparse files are implemented, you can read the RandomAccessFile source code in the JDK, and the JRE source code, to see how they interact. 但是,您可以阅读Linux内核源代码以了解如何实现稀疏文件,您可以阅读JDK中的RandomAccessFile源代码和JRE源代码,以了解它们如何交互。 However, this is probably more detail than you need. 但是,这可能比您需要的更详细。

Your operating system and filesystem support sparse files and when it's the case, seek is implemented to make use of this feature. 您的操作系统和文件系统支持稀疏文件时,它的的情况下, 寻求实现充分利用这一特性。

This is not really related to Java, it's just a feature of fseek and fwrite functions from C library, which are most likely the backend behind File implementation on the JRE you are using. 这与Java并不真正相关,它只是来自C库的fseekfwrite函数的一个特性,它们很可能是您正在使用的JRE上的File实现的后端。

more info: https://en.wikipedia.org/wiki/Sparse_file 更多信息: https//en.wikipedia.org/wiki/Sparse_file

Is there a more efficient way to create a file of n bytes and fill it with 0s? 有没有更有效的方法来创建n个字节的文件并用0填充它?

On operating systems that support it, you could truncate the file to the desired size instead of issuing a write call. 在支持它的操作系统上,您可以将文件截断为所需的大小,而不是发出write调用。 However, this seems to be not available in Java APIs. 但是,这似乎在Java API中不可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM