简体   繁体   English

Python3 urllib.request不会立即关闭连接

[英]Python3 urllib.request will not close connections immediately

I've got the following code to run a continuous loop to fetch some content from a website: 我有以下代码来运行连续循环来从网站获取一些内容:

from http.cookiejar import CookieJar
from urllib import request

cj = CookieJar()
cp = request.HTTPCookieProcessor(cj)
hh = request.HTTPHandler()
opener = request.build_opener(cp, hh)

while True:
    # build url
    req = request.Request(url=url)
    p = opener.open(req)
    c = p.read()
    # process c
    p.close()
    # check for abort condition, or continue

The contents are correctly read. 内容被正确读取。 But for some reason, the TCP connections won't close. 但由于某种原因,TCP连接不会关闭。 I'm observing the active connection count from a dd-wrt router interface, and it goes up consistently. 我正在观察来自dd-wrt路由器接口的活动连接数,并且它一直在上升。 If the script continue to run, it'll exhaust the 4096 connection limit of the router. 如果脚本继续运行,它将耗尽路由器的4096连接限制。 When this happens, the script simply enter waiting state (the router won't allow new connections, but timeout hasn't hit yet). 发生这种情况时,脚本只需进入等待状态(路由器不允许新连接,但超时尚未命中)。 After couple minutes, those connections will be closed and the script can resume again. 几分钟后,这些连接将被关闭,脚本可以再次恢复。

I was able to observe the state of those hanging connections from the router. 我能够从路由器观察到那些悬挂连接的状态。 They share the same state: TIME_WAIT . 它们共享相同的状态:TIME_WAIT。

I'm expecting this script to use no more than 1 TCP connection simultaneously. 我希望这个脚本同时使用不超过1个TCP连接。 What am I doing wrong? 我究竟做错了什么?

I'm using Python 3.4.2 on Mac OS X 10.10. 我在Mac OS X 10.10上使用Python 3.4.2。

Through some research, I discovered the cause of this problem: the design of TCP protocol . 通过一些研究,我发现了这个问题的原因: TCP协议的设计 In a nutshell, when you disconnect, the connection isn't dropped immediately, it enters 'TIME_WAIT' state, and will time out after 4 minutes. 简而言之,当您断开连接时,连接不会立即掉线,它会进入“TIME_WAIT”状态,并会在4分钟后超时。 Unlike what I was expecting, the connection doesn't immediately disappear. 与我期望的不同,连接不会立即消失。

According to this question , it's also not possible to forcefully drop a connection (without restarting the network stack). 根据这个问题 ,也不可能强行删除连接(不重新启动网络堆栈)。

It turns out in my particular case, like this question stated , a better option would be to use a persistent connection, aka HTTP keep-alive. 事实证明,在我的特定情况下,就像这个问题所说 ,更好的选择是使用持久连接,即HTTP keep-alive。 As I'm querying the same server, this will work. 当我查询同一台服务器时,这将有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM