简体繁体 English

向套接字写入sendfile（）系统调用的任意限制吗？

[英]Is writing to a socket an arbitrary limitation of the sendfile() syscall?

原文 2009-12-09 21:23:25 9 2 unix/ kernel/ implementation/ system-calls/ sendfile

Prelude 序幕

sendfile() is an extremely useful syscall for two reasons: sendfile()是一个非常有用的系统调用，其原因有两个：

First, it's less code than a read() / write() (or recv() / send() if you prefer that jive) loop. 首先，它的代码少于read() / write() （或者如果您喜欢jive的话，则为recv() / send() ）循环。
Second, it's faster (less syscalls, implementation may copy between devices without buffer, etc...) than the aforementioned methods. 其次，它比上述方法更快（更少的系统调用，实现可以在没有缓冲区的设备之间进行复制等）。

Less code. 更少的代码。 More efficient. 更高效。 Awesome. 太棒了

In UNIX, everything is (mostly) a file. 在UNIX中，所有内容（大部分）都是文件。 This is the ugly territory from the collision of platonic theory and real-world practice. 这是柏拉图理论与现实世界相撞的丑陋领域。 I understand that sockets are fundamentally different than files residing on some device. 我了解套接字与驻留在某些设备上的文件从根本上不同。 I haven't dug through the sources of Linux/*BSD/Darwin/whatever OS implements sendfile() to know why this specific syscall is restricted to writing to sockets (specifically, streaming sockets). 我没有深入研究Linux / * BSD / Darwin /的源代码，无论什么操作系统实现sendfile()都知道为什么这个特定的系统调用仅限于写入套接字（特别是流套接字）。

I just want to know... 我只是想知道...

Question 题

What is limiting sendfile() from allowing the destination file descriptor to be something besides a socket (like a disk file, or a pipe)? 是什么限制sendfile()允许目标文件描述符不是套接字（例如磁盘文件或管道）之外的内容？

2 个解决方案

I seem to remember that it was a limitation introduced in early Linux 2.6 (2.4 didn't have the limitation). 我似乎记得这是早期Linux 2.6中引入的限制（2.4没有限制）。

Since 2.6.17 Linux has the splice() system call which is similar; 从2.6.17版本开始，Linux具有类似splice（）的系统调用。 more flexible, but slightly less efficient. 更灵活，但效率略低。 Linus talked about re-implementing sendfile in terms of splice(). Linus谈到了用splice（）重新实现sendfile。 See http://kerneltrap.org/node/6505 参见http://kerneltrap.org/node/6505

Fundamentally, the only thing limiting it is that "no-one's written the code yet". 从根本上讲，唯一限制的是“还没有人写过代码”。

However, I gather that the reason that no-ones written the code for those two cases you mention is that they both would require the data to be copied, which removes much of the advantage of using sendfile in the first place. 但是，我认为没有人为您提到的这两种情况编写代码的原因是，它们都需要复制数据，这首先消除了使用sendfile许多优势。

For a file-to-file sendfile , you'd need a copy because otherwise the same page would have to be in the pagecache as both a clean page in the source file and a dirty page in the destination file. 对于文件到文件sendfile ，您需要一个副本，因为否则，同一页必须与源文件中的干净页和目标文件中的脏页一样位于页缓存中。 I don't think the pagecache is built to handle that case at the moment (though of course, this could be changed if there was sufficient motivation). 我不认为页面缓存目前是用来处理这种情况的（当然，如果有足够的动机，可以更改此设置）。
For a file-to-pipe sendfile , you need a copy regardless because the destination process needs to get a private, writeable copy of the data. 对于文件到管道sendfile ，无论目标进程是否需要获取数据的私有可写副本，都需要一个副本。 Anyway, for most uses of this case we already have mmap . 无论如何，对于这种情况的大多数使用，我们已经有了mmap 。