简体   繁体   English

虚拟文件系统如何处理像读写这样的系统调用?

[英]How does the virtual filesystem handle syscalls like read and write?

(All of the code snippets are taken from: https://docs.huihoo.com/doxygen/linux/kernel/3.7/dir_97b3d2b63ac216821c2d7a22ee0ab2b0.html ) (所有代码片段均来自: https : //docs.huihoo.com/doxygen/linux/kernel/3.7/dir_97b3d2b63ac216821c2d7a22ee0ab2b0.html

Hi!你好! To establish my question I have been looking at the Linux fs code for almost a month now for research and I am stuck here.为了确定我的问题,我一直在研究 Linux fs 代码近一个月来进行研究,但我被困在这里。 So I am looking at this code in include/linux/fs.h (which if I am not wrong has the definitions of almost all major structures and pointers used by codes like read_write.c and open.c ) and I observe this code snippet:所以,我在看这个代码include/linux/fs.h (其中,如果我没看错的已经像代码中使用的几乎所有主要结构和指针的定义read_write.copen.c ),我看到这个代码片断:

struct file_operations {
 1519     struct module *owner;
 1520     loff_t (*llseek) (struct file *, loff_t, int);
 1521     ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
 1522     ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
 1523     ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
 1524     ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
 1525     int (*readdir) (struct file *, void *, filldir_t);
 1526     unsigned int (*poll) (struct file *, struct poll_table_struct *);
 1527     long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 1528     long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 1529     int (*mmap) (struct file *, struct vm_area_struct *);
 1530     int (*open) (struct inode *, struct file *);
 1531     int (*flush) (struct file *, fl_owner_t id);
 1532     int (*release) (struct inode *, struct file *);
 1533     int (*fsync) (struct file *, loff_t, loff_t, int datasync);
 1534     int (*aio_fsync) (struct kiocb *, int datasync);
 1535     int (*fasync) (int, struct file *, int);
 1536     int (*lock) (struct file *, int, struct file_lock *);
 1537     ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
 1538     unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
 1539     int (*check_flags)(int);
 1540     int (*flock) (struct file *, int, struct file_lock *);
 1541     ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int);
 1542     ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int);
 1543     int (*setlease)(struct file *, long, struct file_lock **);
 1544     long (*fallocate)(struct file *file, int mode, loff_t offset,
 1545               loff_t len);
 1546 };

Here as you can see they have defined these very specific syscalls which have been declared in their respective files.如您所见,他们定义了这些非常具体的系统调用,这些系统调用已在各自的文件中声明。 For example read_write.c has its definition of read and write syscalls as SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count) and SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count) respectively.例如SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)将读写系统调用定义为SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf, size_t, count)分别。 Now for research purposes, I basically went inside these two definitions and hunted down each and every function call (at least those that were linked in the Doxygen documentation) that happened inside each of them and the function calls inside those function calls but could not answer a very simple question.现在出于研究目的,我基本上进入了这两个定义并追查了每个函数调用(至少那些在 Doxygen 文档中链接的函数调用)发生在每个函数调用中,以及这些函数调用中的函数调用但无法回答一个非常简单的问题。 How do these two syscalls call the virtual filesystem to further call the drivers required to read actual blocks of data from the filesystem?这两个系统调用如何调用虚拟文件系统以进一步调用从文件系统读取实际数据块所需的驱动程序? (If it is filesystem-specific then please show me locations in the code where it is handing it off to the FS drivers) (如果它是特定于文件系统的,那么请在代码中向我显示将它交给 FS 驱动程序的位置)

PS I did the same hunt for the open syscall but was able to find the place where they invoked a part of namei.c code to perform that task specifically here: struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op, int flags) . PS 我对打开的系统调用做了同样的搜索,但能够找到他们调用namei.c代码的一部分来执行该任务的地方,具体如下: struct file *do_filp_open(int dfd, struct filename *pathname, const struct open_flags *op, int flags) here they use the structure nameidata that has the relevant information from the inode to open a file.在这里,他们使用具有来自 inode 的相关信息的结构 nameidata 来打开文件。

In-Kernel Filesystems in Linux Linux 中的内核文件系统

In Linux, in-kernel filesystems are implemented in a modular fashion.在 Linux 中,内核文件系统以模块化方式实现。 For example, each struct inode contains a pointer to a struct file_operations , the same struct you copied in your question.例如,每个struct inode包含一个指向struct file_operations的指针,该指针与您在问题中复制的struct file_operations相同。 This struct contains function pointers for various file operations.此结构包含用于各种文件操作的函数指针。

For example, the member ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);例如成员ssize_t (*read) (struct file *, char __user *, size_t, loff_t *); is a function pointer to a function that takes a struct file * , char * , size_t , and loff_t * as parameters, and returns a ssize_t .是指向函数的函数指针,该函数采用struct file *char *size_tloff_t *作为参数,并返回ssize_t

Routing syscalls to the underlying filesystem将系统调用路由到底层文件系统

When the read system call occurs, the kernel VFS code finds the corresponding inode , and then calls the filesystem's read function that is specified in the struct file_operations .当 read 系统调用发生时,内核 VFS 代码找到相应的inode ,然后调用struct file_operations指定的文件系统的 read 函数。 Here's a trace of the read system call:这是 read 系统调用的跟踪:

  1. the read() syscall handler is invoked, 调用read()系统调用处理程序
  2. which calls ksys_read() ,其中调用ksys_read()
  3. which calls vfs_read() .它调用vfs_read()

This is where the magic happens in vfs_read() :这就是vfs_read()

if (file->f_op->read)
    ret = file->f_op->read(file, buf, count, pos);
else if (file->f_op->read_iter)
    ret = new_sync_read(file, buf, count, pos);
else
    ret = -EINVAL;

A related struct, struct file , also contains a pointer to a struct file_operations .一个相关的结构struct file也包含一个指向struct file_operations的指针。 The above if-condition checks if there is a read() handler for this file, and calls it if it exists.上面的 if 条件检查该文件是否有read()处理程序,如果存在则调用它。 If a read() handler doesn't exist, it checks for a read_iter handler.如果read()处理程序不存在,它会检查read_iter处理程序。 If neither exists, it returns -EINVAL .如果两者都不存在,则返回-EINVAL

Example: ext4示例:ext4

In ext4, the struct file_operations is defined here .在 ext4 中, struct file_operations 在此处定义。 It is used in several places, but it is associated with an inode here .它在多个地方使用,但在此处与一个 inode 相关联。 ext4 defines a read_iter handler (ie. ext4_file_read_iter ), but not a read handler. ext4 定义了一个read_iter处理程序(即ext4_file_read_iter ),但不是一个read处理程序。 So, when read(2) is called on an ext4 file, ext4_file_read_iter() is eventually called.因此,当对 ext4 文件调用read(2)时,最终会调用ext4_file_read_iter()

At this point, we've gotten to filesystem specific code.在这一点上,我们已经了解了文件系统特定的代码。 How ext4 manages blocks can be explored further from here.从这里可以进一步探索 ext4 如何管理块。

I would recommend using ftrace to figure out complete code base.我建议使用 ftrace 找出完整的代码库。 it provides all the function call trace in kernel.它提供了内核中的所有函数调用跟踪。

https://lwn.net/Articles/370423/ https://lwn.net/Articles/370423/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM