I have a list auf URLs which have to be fetched in parallel, therefore I use threading:
for url in urls:
thread = fetch_single(url)
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
# thread.html() is always "None" here:
thread.html()
My threading worker class:
class fetch_single(threading.Thread):
def __init__ (self, url):
threading.Thread.__init__(self)
self.url = url
self.response = None
def run(self):
self.response = consumer.fetch(self.url)
def html(self):
return self.response.getData()
My problem is the line self.response = consumer.fetch(self.url)
. How can I achieve, that the thread waits for the response from the subroutine/class?
This is a alternative solution. You would need to check if each individual thread is "alive" or has completed its task. A quick piece of code to show you the logic would be this:
import threading
import time
requests = {}
urls = ['www.google.se', 'inbox.google.com']
class fetch_single(threading.Thread):
def __init__ (self, url):
threading.Thread.__init__(self)
self.url = url
self.response = None
def html(self):
return self.response.getData()
def run(self):
self.response = consumer.fetch(self.url)
requests[self.url] = self.html()
for url in urls:
thread = fetch_single(url)
thread.start()
while len(threading.enumerate()) > 1:
time.sleep(1)
for url in requests:
html = requests[url]
the while len(threading.enumerate()) > 1
would check if all child-threads have been completed/killed but the main thread (your script) is still alive. Normally in this loop you'd check against a value inside the thread or have some sort of communication system with the threads to check if it's done processing.
But this will give you a better platform to start off from.
Also note that I chose to save your HTML data in a global variable called requests
instead of fetching it from the thread because that will be a locking operation. normally you'd do something such as:
def check_html(self):
if self.respones:
return true
Note that you'd replace while len...
with:
for thread in threading.enumerate():
thread.check_html()
or something along those lines best suited to your needs.
The problem was the scope of consumer
. I had to inject consumer
into the fetch_single
class:
class fetch_single(threading.Thread):
def __init__ (self, consumer, url):
threading.Thread.__init__(self)
self.consumer = consumer
self.url = url
self.response = None
def run(self):
self.response = self.consumer.fetch(self.url)
def html(self):
return self.response.getData()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.