[英]My Python program unexpectedly quit when I use python-requests in sub process
In my spider project, I have a code paragraph that is to crawl "sina weibo" the hottest topic link which will feed my spiders. 在我的蜘蛛项目中,我有一个代码段落,用于抓取“新浪微博”这个最热门的主题链接,它将为我的蜘蛛提供食物。 It work perfectly when I single test it.But, the code paragraph lead to python unexpected to quit when I use them in Process.
当我单独测试它时它完美地工作。但是,当我在Process中使用它时,代码段导致python意外退出。 I found the failure reason is that I use python-requests in the code paragraph.So, when I rewrite it by urllib3, it work normally.
我发现失败的原因是我在代码段中使用python-requests。所以,当我用urllib3重写它时,它正常工作。
This code running in my macOS Mojava. 这段代码在我的macOS Mojava中运行。 Python version is "3.7" and python-requests version is "2.21.0".
Python版本为“3.7”,python-requests版本为“2.21.0”。
"""
The run_spider function periodically crawls the link and feed to the spiders
"""
@staticmethod
def run_spider():
try:
cs = CoreScheduler()
while True:
cs.feed_spider()
first_time = 3 * 60
while not cs.is_finish():
time.sleep(first_time)
first_time = max(10, first_time // 2)
cs.crawl_done()
time.sleep(SPIDER_INTERVAL)
except Exception as e:
print(e)
"""
The cs.feed_spider() just crawl and parse the page, it will return a generator of links. The code is shown below.
"""
def get_page(self):
headers = {
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-cn',
'Host': 's.weibo.com',
'Accept-Encoding': 'br, gzip, deflate',
"User-Agent": 'Mozilla/5.0 (iPhone; CPU iPhone OS 11_3) AppleWebKit/605.1.15\
(KHTML, like Gecko) Version/11.0 Mobile/15E148 Safari/604.1',
}
# res = requests.get(self.TARGET_URL, headers=headers)
http = urllib3.PoolManager()
res = http.request("GET", self.TARGET_URL, headers=headers)
if 200 == res.status:
return res.data
else:
return None
"""
The crawler will become a child process. like below.
"""
def run(self):
spider_process = Process(target=Scheduler.run_spider)
spider_process.start()
I expect using python-requests would work, but it caused the program to quit unexpectedly. 我希望使用python-requests可以工作,但它导致程序意外退出。 When I rewrite the code using urllib3, the program runs fine.
当我使用urllib3重写代码时,程序运行正常。 I don't understand why.
我不明白为什么。
You started the process, but I can't see you waiting for it. 你开始了这个过程,但我看不到你在等它。 The join() function will cause the main thread to pause executing until the spider_process thread has completed its execution.
join()函数将导致主线程暂停执行,直到spider_process线程完成其执行。
Ie 即
def run(self):
spider_process = Process(target=Scheduler.run_spider)
spider_process.start()
spider_process.join()
Here's a link to the official join() documentation: https://docs.python.org/3/library/threading.html#threading.Thread.join 这是官方join()文档的链接: https : //docs.python.org/3/library/threading.html#threading.Thread.join
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.