简体   繁体   中英

Python Threading wait for return

I have a list auf URLs which have to be fetched in parallel, therefore I use threading:

for url in urls:
    thread = fetch_single(url)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()
    # thread.html() is always "None" here:
    thread.html()

My threading worker class:

class fetch_single(threading.Thread):
    def __init__ (self, url):
        threading.Thread.__init__(self)
        self.url = url
        self.response = None

    def run(self):
        self.response = consumer.fetch(self.url)

    def html(self):
        return self.response.getData()

My problem is the line self.response = consumer.fetch(self.url) . How can I achieve, that the thread waits for the response from the subroutine/class?

This is a alternative solution. You would need to check if each individual thread is "alive" or has completed its task. A quick piece of code to show you the logic would be this:

import threading
import time

requests = {}
urls = ['www.google.se', 'inbox.google.com']

class fetch_single(threading.Thread):
    def __init__ (self, url):
        threading.Thread.__init__(self)
        self.url = url
        self.response = None

    def html(self):
        return self.response.getData()

    def run(self):
        self.response = consumer.fetch(self.url)
        requests[self.url] = self.html()

for url in urls:
    thread = fetch_single(url)
    thread.start()

while len(threading.enumerate()) > 1:
    time.sleep(1)

for url in requests:
    html = requests[url]

the while len(threading.enumerate()) > 1 would check if all child-threads have been completed/killed but the main thread (your script) is still alive. Normally in this loop you'd check against a value inside the thread or have some sort of communication system with the threads to check if it's done processing.

But this will give you a better platform to start off from.

Also note that I chose to save your HTML data in a global variable called requests instead of fetching it from the thread because that will be a locking operation. normally you'd do something such as:

def check_html(self):
    if self.respones:
        return true

Note that you'd replace while len... with:

for thread in threading.enumerate():
    thread.check_html()

or something along those lines best suited to your needs.

The problem was the scope of consumer . I had to inject consumer into the fetch_single class:

class fetch_single(threading.Thread):
    def __init__ (self, consumer, url):
        threading.Thread.__init__(self)
        self.consumer = consumer
        self.url = url
        self.response = None

    def run(self):
        self.response = self.consumer.fetch(self.url)

    def html(self):
        return self.response.getData()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM