I have been working with a very simple bit of code, but the behavior is very strange. I am trying to send a request to a webpage using requests.get
, but if the request takes longer than a few seconds, I would like to kill the process. I am following the response from the accepted answer here , but changing the function body to include the request
. My code is below:
import multiprocessing as mp, requests
def get_page(_r):
_rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
_r.put(_rs)
q = mp.Queue()
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()
try:
result = q.get(False)
print(result)
except:
print('failed')
The code above simply hanges when running it. However, when I run
requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
independently, a response is returned in under two seconds. Therefore, main code should print the page's HTML, however, it just stalls. Oddly, when I put an infinite loop in get_page
:
def get_page(_r):
while True:
pass
_r.put('You will not see this')
the process is indeed terminated after three seconds. Therefore, I am certain the behavior has to do with requests
. How could this be? I discovered a similar question here , but I am not using async
. Could the issue have to do with monkey patching, as I am using requests
along with time
and multiprocessing
? Any suggestions or comments would be appreciated. Thank you!
I am using:
Python 3.7.0
requests
2.21.0
Edit: @Hitobat pointed out that a param timeout
can be used instead with requests
. This does indeed work, however, I would appreciate any other ideas pertaining to why the requests
is failing with multiprocessing
.
I have reproduced your scenario and I have to refute the mentioned supposition "I am certain the behavior has to do with requests" .
requests.get(...)
returns the response as expected.
Let see how the process goes with some debug points:
import multiprocessing as mp, requests
import time
def get_page(_r):
_rs = requests.get('https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas').text
print('--- response header', _rs[:17])
_r.put(_rs)
q = mp.Queue()
p = mp.Process(target=get_page, args=(q,))
p.start()
time.sleep(3)
p.terminate()
p.join()
try:
print('--- get data from queue of size', q.qsize())
result = q.get(False)
print(result)
except Exception as ex:
print('failed', str(ex))
The output:
--- response header
<!DOCTYPE html>
--- get data from queue of size 1
As we see the response is there and the process even advanced to try
block statements but it hangs/stops at the statement q.get()
when trying to extract data from the queue. Therefore we may conclude that the queue is likely to be corrupted. And we have a respective warning in multiprocessing
library documentation ( Pipes and Queues
section):
Warning
If a process is killed using
Process.terminate()
or os.kill() while it is trying to use a Queue, then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on.
Looks like this is that kind of case.
How can we handle this issue?
A known workaround is using mp.Manager().Queue()
(with intermediate proxying level) instead of mp.Queue
:
...
q = mp.Manager().Queue()
p = mp.Process(target=get_page, args=(q,))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.