简体   繁体   中英

In the linux_dirent64 structs written in the Linux syscall getdents64, why is d_off not the sum of the d_reclens of all earlier entries?

According the man page of getdents :

d_off is the distance from the start of the directory to the start of the next linux_dirent . d_reclen is the size of this entire linux_dirent .

So I would expect that if the first entry has d_reclen n , its d_off would also be n (and for the i -th entry, d_off would be the sum of the d_reclen s of all entries from 0 to i , inclusive).

However, in that same man page, a nicely printed table with the entries of an example directory looks like this:

 --------------- nread=120 --------------- inode# file type d_reclen d_off d_name 2 directory 16 12. 2 directory 16 24.. 11 directory 24 44 lost+found 12 regular 16 56 a 228929 directory 16 68 sub 16353 directory 16 80 sub2 130817 directory 16 4096 sub3

The d_off fields of the entries do not seem to follow the rule as I expected. If the first entry has size 16, surely the offset from the start to the second entry would be 16, but apparently it's actually 12.

So what don't I understand about the d_off field of linux_dirent64 ?

It's explained vaguely in the manual page, but as you can probably see by compiling and running the example program, your assumption does not hold.

The manual page for readdir(3) gives a bit more insight:

d_off  The value returned in d_off is the same as would be returned by
       calling telldir(3) at the current position in the directory
       stream.  Be aware that despite its type and name, the d_off field
       is seldom any kind of directory offset on modern filesystems.
       Applications should treat this field as an opaque value, making no
       assumptions about its contents; see also telldir(3).

The key part is "the d_off field is seldom any kind of directory offset on modern filesystems" . The d_off field is a value for internal use by the underlying filesystem, and its meaning is implementation-specific. It does not necessarily have any correlation with d_reclen , nor does it need to represent an actual "offset" in memory. Whatever software you write, you should not rely on the value of d_off and consider it like an opaque identifier.

There may be filesystems where d_off corresponds to an actual offset in bytes between dirents, but this is in general not the case. The field is used more or less like a unique "counter" or "cookie" value to distinguish files inside a directory.

In fact, if you take a look at the values on a Btrfs filesystem, d_off seems to start at 1 for . and 2 for .. , increasing by one for any following dirent , with the last one having d_off equal to INT32_MAX . At least for a directory with fresh newly created files, things will change after deleting/moving/creating more files.

$ mkdir test
$ cd test
$ touch a b c d e f
$ ls -l
total 0
-rw-r----- 1 marco marco 0 gen 15 01:20 a
-rw-r----- 1 marco marco 0 gen 15 01:20 b
-rw-r----- 1 marco marco 0 gen 15 01:20 c
-rw-r----- 1 marco marco 0 gen 15 01:20 d
-rw-r----- 1 marco marco 0 gen 15 01:20 e
-rw-r----- 1 marco marco 0 gen 15 01:20 f

$ ../test_program
--------------- nread=192 ---------------
inode#    file type  d_reclen  d_off   d_name
46206659  directory    24          1  .
  214242  directory    24          2  ..
46206662  regular      24          3  a
46206663  regular      24          4  b
46206664  regular      24          5  c
46206665  regular      24          6  d
46206666  regular      24          7  e
46206667  regular      24 2147483647  f

This 2004 Sourceware bug report for Glibc by Dan Tsafrir also contains some insightful explanations about d_off , such as:

  • In the implementation of getdents() , the d_off field (belonging to the linux kernel's dirent structure) is falsely assumed to contain the byte offset to the next dirent . Note that the linux manual of the readdir system-call states that d_off is the "offset to this dirent " while glibc's getdents treats it as the offset to the next dirent .

  • In practice, both of the above are wrong/misleading. The d_off field may contain illegal negative values, 0 (should also never happen as the "next" dirent 's offset must always be bigger then 0), or positive values that are bigger than the size of the directory-file itself:

    • We're not sure what the Linux kernel intended to place in this field, but our experience shows that on "real" file systems (that actually reside on some disk) the offset seems to be a simple (not necessarily continuous) counter: eg first entry may have d_off=1 , second: d_off=2 , third: d_off=4096 , fourth= d_off=4097 etc. We conjecture this is the serial of the dirent record within the directory (and so, this is indeed the "offset", but counted in records out of which some were already removed).

    • For file systems that are maintained by the amd automounter (automount, directories) the d_off seems to be arbitrary (and may be negative, zero or beyond the scope of a 32bit integer). We conjecture the amd doesn't assign this field and the received values are simply garbage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM