用request.get挂起的多处理

Question

I have been working with a very simple bit of code, but the behavior is very strange. 我一直在处理一些非常简单的代码，但是这种行为非常奇怪。 I am trying to send a request to a webpage using requests.get , but if the request takes longer than a few seconds, I would like to kill the process. 我正在尝试使用request.get将requests.get发送到网页，但是如果该请求花费的时间超过几秒钟，那么我想终止该过程。 I am following the response from the accepted answer here , but changing the function body to include the request . 我下面从接受答案的响应这里，但改变函数体包含了request 。 My code is below: 我的代码如下：

import multiprocessing as mp, requests
def get_page(_r):                   
  _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
  _r.put(_rs)

q = mp.Queue() 
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()
try:
   result = q.get(False)
   print(result)
except:
   print('failed')

The code above simply hanges when running it. 上面的代码在运行时会挂起。 However, when I run 但是，当我跑步时

requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text

independently, a response is returned in under two seconds. 独立地，在两秒钟内返回响应。 Therefore, main code should print the page's HTML, however, it just stalls. 因此，主代码应该打印页面的HTML，但是它只是停滞了。 Oddly, when I put an infinite loop in get_page : 奇怪的是，当我在get_page放入无限循环时：

def get_page(_r): 
  while True:
     pass
  _r.put('You will not see this')

the process is indeed terminated after three seconds. 该过程确实在三秒钟后终止。 Therefore, I am certain the behavior has to do with requests . 因此，我确定行为与requests 。 How could this be? 怎么会这样 I discovered a similar question here , but I am not using async . 我在这里发现了类似的问题，但是我没有使用async 。 Could the issue have to do with monkey patching, as I am using requests along with time and multiprocessing ? 问题可能与猴子修补有关，因为我将requests与time和multiprocessing一起使用？ Any suggestions or comments would be appreciated. 任何建议或意见，将不胜感激。 Thank you! 谢谢！

I am using: 我在用：

Python 3.7.0 的Python 3.7.0
requests 2.21.0 requests 2.21.0

Edit: @Hitobat pointed out that a param timeout can be used instead with requests . 编辑：@Hitobat指出，可以将params timeout用于requests 。 This does indeed work, however, I would appreciate any other ideas pertaining to why the requests is failing with multiprocessing . 的确确实有效，但是，与其他原因有关的multiprocessing requests失败的原因，我将不胜感激。

Answer 1

I have reproduced your scenario and I have to refute the mentioned supposition "I am certain the behavior has to do with requests" . 我已转载了您的情况，并且我不得不驳斥上述假设“我确定行为与请求有关” 。
requests.get(...) returns the response as expected. requests.get(...)返回预期的响应。

Let see how the process goes with some debug points: 让我们看一下调试过程如何进行：

import multiprocessing as mp, requests
import time


def get_page(_r):
    _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
    print('--- response header', _rs[:17])
    _r.put(_rs)


q = mp.Queue()
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()

try:
    print('--- get data from queue of size', q.qsize())
    result = q.get(False)
    print(result)
except Exception as ex:
    print('failed', str(ex))

The output: 输出：

--- response header 
<!DOCTYPE html>
--- get data from queue of size 1

As we see the response is there and the process even advanced to try block statements but it hangs/stops at the statement q.get() when trying to extract data from the queue. 正如我们看到的那样，响应在那里，进程甚至可以try块语句，但是当尝试从队列中提取数据时，它会挂起/停止在q.get()语句上。 Therefore we may conclude that the queue is likely to be corrupted. 因此，我们可以得出结论，队列很可能已损坏。 And we have a respective warning in multiprocessing library documentation ( Pipes and Queues section): 而且在multiprocessing库文档中（ Pipes and Queues部分），我们有相应的警告：

Warning 警告

If a process is killed using Process.terminate() or os.kill() while it is trying to use a Queue, then the data in the queue is likely to become corrupted. 如果在尝试使用队列时使用Process.terminate()或os.kill（）将其杀死，则队列中的数据可能会损坏。 This may cause any other process to get an exception when it tries to use the queue later on. 这可能会导致其他进程稍后尝试使用队列时获得异常。

Looks like this is that kind of case. 看起来是这种情况。

How can we handle this issue? 我们如何处理这个问题？

A known workaround is using mp.Manager().Queue() (with intermediate proxying level) instead of mp.Queue : 已知的解决方法是使用mp.Manager().Queue() （具有中间代理级别）而不是mp.Queue ：

...
q = mp.Manager().Queue()
p = mp.Process(target=get_page, args=(q,))

用request.get挂起的多处理

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-09-07 19:38:43

用request.get挂起的多处理

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-09-07 19:38:43

解决方案1
2 已采纳 2019-09-07 19:38:43