简体   繁体   中英

Multiprocessing hanging with requests.get

I have been working with a very simple bit of code, but the behavior is very strange. I am trying to send a request to a webpage using requests.get , but if the request takes longer than a few seconds, I would like to kill the process. I am following the response from the accepted answer here , but changing the function body to include the request . My code is below:

import multiprocessing as mp, requests
def get_page(_r):                   
  _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
  _r.put(_rs)

q = mp.Queue() 
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()
try:
   result = q.get(False)
   print(result)
except:
   print('failed')

The code above simply hanges when running it. However, when I run

requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text

independently, a response is returned in under two seconds. Therefore, main code should print the page's HTML, however, it just stalls. Oddly, when I put an infinite loop in get_page :

def get_page(_r): 
  while True:
     pass
  _r.put('You will not see this')

the process is indeed terminated after three seconds. Therefore, I am certain the behavior has to do with requests . How could this be? I discovered a similar question here , but I am not using async . Could the issue have to do with monkey patching, as I am using requests along with time and multiprocessing ? Any suggestions or comments would be appreciated. Thank you!

I am using:

  • Python 3.7.0

  • requests 2.21.0

Edit: @Hitobat pointed out that a param timeout can be used instead with requests . This does indeed work, however, I would appreciate any other ideas pertaining to why the requests is failing with multiprocessing .

I have reproduced your scenario and I have to refute the mentioned supposition "I am certain the behavior has to do with requests" .
requests.get(...) returns the response as expected.

Let see how the process goes with some debug points:

import multiprocessing as mp, requests
import time


def get_page(_r):
    _rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
    print('--- response header', _rs[:17])
    _r.put(_rs)


q = mp.Queue()
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()

try:
    print('--- get data from queue of size', q.qsize())
    result = q.get(False)
    print(result)
except Exception as ex:
    print('failed', str(ex))

The output:

--- response header 
<!DOCTYPE html>
--- get data from queue of size 1

As we see the response is there and the process even advanced to try block statements but it hangs/stops at the statement q.get() when trying to extract data from the queue. Therefore we may conclude that the queue is likely to be corrupted. And we have a respective warning in multiprocessing library documentation ( Pipes and Queues section):

Warning

If a process is killed using Process.terminate() or os.kill() while it is trying to use a Queue, then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on.

Looks like this is that kind of case.


How can we handle this issue?

A known workaround is using mp.Manager().Queue() (with intermediate proxying level) instead of mp.Queue :

...
q = mp.Manager().Queue()
p = mp.Process(target=get_page, args=(q,))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM