简体   繁体   中英

Python URLLib does not work with PyQt + Multiprocessing

A simple code as such:

import urllib2
import requests

from PyQt4 import QtCore

import multiprocessing
import time

data = (
    ['a', '2'], 
)

def mp_worker((inputs, the_time)):
    r = requests.get('http://www.gpsbasecamp.com/national-parks')
    request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
    response = urllib2.urlopen(request)

def mp_handler():
    p = multiprocessing.Pool(2)
    p.map(mp_worker, data)

if __name__ == '__main__':
    mp_handler()

Basically, if i import PyQt4, and i have a urllib request (i believe this is used in almost all web extraction libraries such as BeautifulSoup, Requests or Pyquery. it crashes with a cryptic log on my MAC)

This is exactly True. It always fails on Mac, I have wasted rows of days just to fix this. And honestly there is no fix as of now. The best way is to use Thread instead of Process and it will work like a charm.

By the way -

r = requests.get('http://www.gpsbasecamp.com/national-parks')

and

request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)

do one and the same thing. Why are you doing it twice?

This may be due _scproxy.get_proxies() not being fork-safe on Mac.

This is raised here https://bugs.python.org/issue33725#msg329926

_scproxy has been known to be problematic for some time, see for instance Issue31818. That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.

This is something that urllib may be attempting to do causing failure when multiprocessing.

There is a workaround and that is to set the environmental variable no-proxy to *

Eg. export no_proxy=*

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM