简体   繁体   English

connect() 与 unix 域套接字和完整的积压

[英]connect() with unix-domain socket and full backlog

When the listening backlog is full for STREAM unix-domain sockets, connect(2) fails on most systems with ECONNREFUSED.当 STREAM unix 域套接字的侦听积压已满时, connect(2)在大多数带有 ECONNREFUSED 的系统上失败。 It would be preferable for it to return EAGAIN.它最好返回 EAGAIN。

The reasoning is that it is highly useful to be able to distinguish between the two cases of dead socket (node exists in filesystem, but no process listening anymore) and the case of full backlog.原因是能够区分死套接字(节点存在于文件系统中,但不再有进程侦听)和完全积压的两种情况非常有用。 I ran into this problem when porting some Linux software which has some code to clean up dead sockets, but it's a security vulnerability if the code can be tricked into deleting sockets by spamming them to fill up their backlog.我在移植一些 Linux 软件时遇到了这个问题,这些软件有一些代码来清理死套接字,但如果代码可以通过向它们发送垃圾邮件来填充它们的积压来欺骗它们删除套接字,那么这是一个安全漏洞。

Only Linux returns EAGAIN;只有 Linux 返回 EAGAIN; AIX, Solaris and Darwin follow BSD behaviour (just tested on each). AIX、Solaris 和 Darwin 遵循 BSD 行为(只是在每个上进行了测试)。

POSIX doesn't list EAGAIN as a possible return code from connect() ( link ), so there may be some compliance issue here. POSIX 没有将 EAGAIN 列为 connect() ( link ) 的可能返回码,因此此处可能存在一些合规性问题。

What's the best route to get everyone to change in line with Linux?让每个人都适应 Linux 的最佳途径是什么? I could go and file a bug report with Oracle, Apple, a FreeBSD PR, and fight it out on the mailing lists of each organisation.我可以去向 Oracle、Apple 和 FreeBSD PR 提交错误报告,并在每个组织的邮件列表上进行斗争。 Or should I pester someone in a standards body (Austin group)?或者我应该在标准机构(奥斯汀小组)中纠缠某人? Is it even advisable to try and get everyone to change here, even though the advantage is clear?即使优势很明显,尝试让每个人都改变这里是否可取?

Whether or not you attempt to change the standard, or change how vendors have implemented connect() , I would argue from the point of view of the software, it wouldn't make any difference.无论您是否尝试更改标准,或者更改供应商实现connect() ,我认为从软件的角度来看,都没有任何区别。 ECONNREFUSED and EAGAIN should both be treated as a retry. ECONNREFUSEDEAGAIN都应视为重试。

Distinguishing between the two cases may allow you to write a more specific diagnostic message on the client, but the retry logic should be the same.区分这两种情况可能会让您在客户端编写更具体的诊断消息,但重试逻辑应该是相同的。 Even if the listener doesn't currently exist, it may eventually exist, so a retry should be attempted.即使侦听器当前不存在,它也可能最终存在,因此应尝试重试。

try_again:
    rc = connect(s, (void *)&addr, sizeof(addr));
    if (rc == 0) return connect_succeeded(s, &addr);
    switch (errno) {
    case EAGAIN:
    case ECONNREFUSED:
        if (should_try_again(retries++)) {
            goto try_again;
        }
        break;
    case EINTR:
        goto try_again;
    default:        
        break;
    }
    return connect_failed(s, errno);

Like with most things, if you can convince the source to make the move, then all the users will have to comply.与大多数事情一样,如果您可以说服消息来源采取行动,那么所有用户都必须遵守。 So having POSIX fix the issue is certainly what would be best.所以让 POSIX解决这个问题当然是最好的。 Then all implementations will have to comply.那么所有的实现都必须遵守。

Another solution is to use the system on which it works.另一种解决方案是使用它工作的系统。 ie only use Linux machines (in this case).即只使用 Linux 机器(在这种情况下)。 Also with Linux you can tweak the kernel and make it work one way or the other (which is also doable with BSD).同样使用 Linux,您可以调整内核并使其以一种或另一种方式工作(这在 BSD 中也是可行的)。

Now, my opinion on the matter, it seems to me that the main issue here would be a rogue process attempting to open a socket to receive messages instead of the intended service.现在,我对此事的看法,在我看来,这里的主要问题是一个流氓进程试图打开一个套接字来接收消息而不是预期的服务。 My question would be: how often does that happen?我的问题是:这种情况多久发生一次?

If you use a system similar to systemctl, you will have one and only one instance of the service running.如果您使用类似于 systemctl 的系统,您将只有一个运行该服务的实例。 If you are attempting to open the AF_UNIX socket at that point in your service, you really are the only one trying to do so.如果您在服务中的那个时候尝试打开AF_UNIX套接字,那么您确实是唯一一个尝试这样做的人。 Therefore, deleting the file is a non-issue.因此,删除文件不是问题。

If you allow rogue software to run on your system (ie it is public and you have many users with access, like in the old days when people would telnet to a server), then maybe you need to use a TCP or UDP connections instead.如果您允许流氓软件在您的系统上运行(即它是公共的并且您有许多用户可以访问,就像过去人们通过 telnet 连接到服务器一样),那么您可能需要改用 TCP 或 UDP 连接。

Finally, if another software is able to open that AF_UNIX socket, I think that you're in trouble anyway since your service can be running while that rogue software (1) deletes your socket, (2) bind() 's anew, (3) your new clients are now talking to that rogue software, not your service, even though you are still running and listening for connections... on a hidden file socket.最后,如果另一个软件能够打开那个AF_UNIX套接字,我认为无论如何你都会遇到麻烦,因为你的服务可以在流氓软件(1)删除你的套接字时运行,(2) bind()重新启动,( 3)您的新客户现在正在与该流氓软件对话,而不是您的服务,即使您仍在运行并侦听连接......在隐藏的文件套接字上。 (Note that old clients will continue to talk with your service, any client that reconnects will talk to the rogue software). (请注意,旧客户端将继续与您的服务对话,任何重新连接的客户端都将与流氓软件对话)。

So... where is your issue?所以……你的问题在哪里?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM