简体   繁体   English

关闭套接字不释放文件描述符

[英]close on socket not releasing file descriptor

When conducting a stress test on some server code I wrote, I noticed that even though I am calling close() on the descriptor handle (and verifying the result for errors) that the descriptor is not released which eventually causes accept() to return an error "Too many open files". 在对我编写的某些服务器代码进行压力测试时,我注意到即使我在描述符句柄上调用close()(并验证错误的结果),也没有释放描述符,最终导致accept()返回错误“打开文件太多”。

Now I understand that this is because of the ulimit, what I don't understand is why I am hitting it if I call close() after each synchronous accept/read/send cycle? 现在我明白这是因为ulimit,我不明白为什么如果我在每个同步接受/读/发送周期后调用close(),我会遇到它?

I am validating that the descriptors are in fact there by running a watch with lsof: 我通过运行带有lsof的监视来验证描述符实际上是在那里:

ctsvr  9733 mike 1017u  sock     0,7      0t0 3323579 can't identify protocol
ctsvr  9733 mike 1018u  sock     0,7      0t0 3323581 can't identify protocol
...

And sure enough there are about 1000 or so of them. 当然,大约有1000个左右。 Further more, checking with netstat I can see that there are no hanging TCP states (no WAIT or STOPPED or anything). 此外,使用netstat检查我可以看到没有挂起的TCP状态(没有WAIT或STOPPED或任何东西)。

If I simply do a single connect/send/recv from the client, I do notice that the socket does stay listed in lsof; 如果我只是从客户端执行单个connect / send / recv,我会注意到套接字确实保留在lsof中; so this is not even a load issue. 所以这甚至不是负载问题。

The server is running on an Ubuntu Linux 64-bit machine. 服务器在Ubuntu Linux 64位计算机上运行。

Any thoughts? 有什么想法吗?

So using strace (thanks Gearoid), which I have no idea how I ever lived without, I noted I was in fact closing the descriptors. 因此,使用strace(感谢Gearoid),我不知道我是如何生活的,我注意到我实际上是在关闭描述符。

However. 然而。 And for the sake of posterity I lay bare my foolish mistake: 为了后代,我露出了愚蠢的错误:

Socket::Socket() : impl(new Impl) {
    impl->fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    ....
}

Socket::ptr_t Socket::accept() {
    auto r = ::accept(impl->fd, NULL, NULL);
    ...
    ptr_t s(new Socket);
    s->impl->fd = r;
    return s;
}

As you can see, my constructor allocated a socket immediately, and then I replaced the descriptor with the one returned by accept - creating a leak. 正如您所看到的,我的构造函数立即分配了一个套接字,然后我用accept返回的描述符替换了描述符 - 创建了一个泄漏。 I had refactored the accept code from a standalone Acceptor class into the Socket class without changing this. 我已经将独立的Acceptor类中的接受代码重构为Socket类而不更改它。

Using strace I could easily see socket() being run each time which lead to my light bulb moment. 使用strace我可以很容易地看到每次运行socket()导致我的灯泡时刻。

Thanks all for the help! 谢谢大家的帮助!

Have you ever called perror() after close()? 你有没有在close()之后调用perror()? I think the returned string will give you some help; 我认为返回的字符串会给你一些帮助;

You are most probably hanging on a recv() or send() command. 您最有可能挂在recv()send()命令上。 Consider setting a timeout using setsockopt . 考虑使用setsockopt设置超时。

I noticed a similar output on lsof when the socket was closed on the other end but my thread was keeping the socket open hanging on the recv() command waiting for data. 当套接字在另一端关闭时,我注意到lsof上有类似的输出,但是我的线程在等待数据的recv()命令上保持套接字打开。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM