[英]How does the virtual filesystem handle syscalls like read and write?
(All of the code snippets are taken from: https://docs.huihoo.com/doxygen/linux/kernel/3.7/dir_97b3d2b63ac216821c2d7a22ee0ab2b0.html ) (所有代码片段均来自: https : //docs.huihoo.com/doxygen/linux/kernel/3.7/dir_97b3d2b63ac216821c2d7a22ee0ab2b0.html )
Hi!你好! To establish my question I have been looking at the Linux fs code for almost a month now for research and I am stuck here.
为了确定我的问题,我一直在研究 Linux fs 代码近一个月来进行研究,但我被困在这里。 So I am looking at this code in
include/linux/fs.h
(which if I am not wrong has the definitions of almost all major structures and pointers used by codes like read_write.c
and open.c
) and I observe this code snippet:所以,我在看这个代码
include/linux/fs.h
(其中,如果我没看错的已经像代码中使用的几乎所有主要结构和指针的定义read_write.c
和open.c
),我看到这个代码片断:
struct file_operations {
1519 struct module *owner;
1520 loff_t (*llseek) (struct file *, loff_t, int);
1521 ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
1522 ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
1523 ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
1524 ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
1525 int (*readdir) (struct file *, void *, filldir_t);
1526 unsigned int (*poll) (struct file *, struct poll_table_struct *);
1527 long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
1528 long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
1529 int (*mmap) (struct file *, struct vm_area_struct *);
1530 int (*open) (struct inode *, struct file *);
1531 int (*flush) (struct file *, fl_owner_t id);
1532 int (*release) (struct inode *, struct file *);
1533 int (*fsync) (struct file *, loff_t, loff_t, int datasync);
1534 int (*aio_fsync) (struct kiocb *, int datasync);
1535 int (*fasync) (int, struct file *, int);
1536 int (*lock) (struct file *, int, struct file_lock *);
1537 ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
1538 unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
1539 int (*check_flags)(int);
1540 int (*flock) (struct file *, int, struct file_lock *);
1541 ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
1542 ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
1543 int (*setlease)(struct file *, long, struct file_lock **);
1544 long (*fallocate)(struct file *file, int mode, loff_t offset,
1545 loff_t len);
1546 };
Here as you can see they have defined these very specific syscalls which have been declared in their respective files.如您所见,他们定义了这些非常具体的系统调用,这些系统调用已在各自的文件中声明。 For example read_write.c has its definition of read and write syscalls as
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
and SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
respectively.例如
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
将读写系统调用定义为SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
和SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)
分别。 Now for research purposes, I basically went inside these two definitions and hunted down each and every function call (at least those that were linked in the Doxygen documentation) that happened inside each of them and the function calls inside those function calls but could not answer a very simple question.现在出于研究目的,我基本上进入了这两个定义并追查了每个函数调用(至少那些在 Doxygen 文档中链接的函数调用)发生在每个函数调用中,以及这些函数调用中的函数调用但无法回答一个非常简单的问题。 How do these two syscalls call the virtual filesystem to further call the drivers required to read actual blocks of data from the filesystem?
这两个系统调用如何调用虚拟文件系统以进一步调用从文件系统读取实际数据块所需的驱动程序? (If it is filesystem-specific then please show me locations in the code where it is handing it off to the FS drivers)
(如果它是特定于文件系统的,那么请在代码中向我显示将它交给 FS 驱动程序的位置)
PS I did the same hunt for the open syscall but was able to find the place where they invoked a part of namei.c
code to perform that task specifically here: struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op, int flags)
. PS 我对打开的系统调用做了同样的搜索,但能够找到他们调用
namei.c
代码的一部分来执行该任务的地方,具体如下: struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op, int flags)
。 here they use the structure nameidata that has the relevant information from the inode to open a file.在这里,他们使用具有来自 inode 的相关信息的结构 nameidata 来打开文件。
In Linux, in-kernel filesystems are implemented in a modular fashion.在 Linux 中,内核文件系统以模块化方式实现。 For example, each
struct inode
contains a pointer to a struct file_operations
, the same struct you copied in your question.例如,每个
struct inode
包含一个指向struct file_operations
的指针,该指针与您在问题中复制的struct file_operations
相同。 This struct contains function pointers for various file operations.此结构包含用于各种文件操作的函数指针。
For example, the member ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
例如成员
ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
is a function pointer to a function that takes a struct file *
, char *
, size_t
, and loff_t *
as parameters, and returns a ssize_t
.是指向函数的函数指针,该函数采用
struct file *
、 char *
、 size_t
和loff_t *
作为参数,并返回ssize_t
。
When the read system call occurs, the kernel VFS code finds the corresponding inode
, and then calls the filesystem's read function that is specified in the struct file_operations
.当 read 系统调用发生时,内核 VFS 代码找到相应的
inode
,然后调用struct file_operations
指定的文件系统的 read 函数。 Here's a trace of the read system call:这是 read 系统调用的跟踪:
read()
syscall handler is invoked, read()
系统调用处理程序,ksys_read()
,ksys_read()
,vfs_read()
.vfs_read()
。 This is where the magic happens in vfs_read()
:这就是
vfs_read()
:
if (file->f_op->read)
ret = file->f_op->read(file, buf, count, pos);
else if (file->f_op->read_iter)
ret = new_sync_read(file, buf, count, pos);
else
ret = -EINVAL;
A related struct, struct file
, also contains a pointer to a struct file_operations
.一个相关的结构
struct file
也包含一个指向struct file_operations
的指针。 The above if-condition checks if there is a read()
handler for this file, and calls it if it exists.上面的 if 条件检查该文件是否有
read()
处理程序,如果存在则调用它。 If a read()
handler doesn't exist, it checks for a read_iter
handler.如果
read()
处理程序不存在,它会检查read_iter
处理程序。 If neither exists, it returns -EINVAL
.如果两者都不存在,则返回
-EINVAL
。
In ext4, the struct file_operations
is defined here .在 ext4 中,
struct file_operations
在此处定义。 It is used in several places, but it is associated with an inode here .它在多个地方使用,但在此处与一个 inode 相关联。 ext4 defines a
read_iter
handler (ie. ext4_file_read_iter
), but not a read
handler. ext4 定义了一个
read_iter
处理程序(即ext4_file_read_iter
),但不是一个read
处理程序。 So, when read(2)
is called on an ext4 file, ext4_file_read_iter()
is eventually called.因此,当对 ext4 文件调用
read(2)
时,最终会调用ext4_file_read_iter()
。
At this point, we've gotten to filesystem specific code.在这一点上,我们已经了解了文件系统特定的代码。 How ext4 manages blocks can be explored further from here.
从这里可以进一步探索 ext4 如何管理块。
I would recommend using ftrace to figure out complete code base.我建议使用 ftrace 找出完整的代码库。 it provides all the function call trace in kernel.
它提供了内核中的所有函数调用跟踪。
https://lwn.net/Articles/370423/ https://lwn.net/Articles/370423/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.