简体   繁体   中英

Python threading with queue and unresponsive threads

In my python code I make a call to an external api to get a list of images' urls. For each of these urls I create a thread to generate a thumbnail. Here is the part of the code of interest:

def process_image(image, size, cropping, counter, queue):
    options = dict(crop=cropping)
    img = get_thumbnail(image['url'], size, **options)
    queue.put((counter, img))
    return img

...

queue = Queue()

# Get some information about an artist. Images are also included.
artist = get_profile(artist_id, buckets)

# Generate images' thumbnails
threads = [Thread(target=process_image, args=(img, '500', 'center', counter, queue)) for counter, img in enumerate(artist.data['images'])]

for p in threads:
    p.start()
for p in threads:
    p.join()

imgs = []
# Collect processed images from threads
while not queue.empty():
    el = queue.get()
    imgs.append((el[0], el[1]))

My problem is that some of the urls don't work, what I mean is that if I copy-paste the url in the browser it keeps loading and loading and loading a bit more until a Time Out is returned. Obviously I added multithreading to speed things up. The first URL that causes this problem is the 4th one, so if I add...

# Generate images' thumbnails
threads = [Thread(target=process_image, args=(img, '500', 'center', counter, queue)) for counter, img in enumerate(artist.data['images'])]
treads = threads[:3]

everything works as expected and very quick, otherwise it gets blocked for a long time and it finally terminates the execution. I would like to set some kind of timeout (say 1 second) for the thread to run the function and if the url does not work and the thread does not finish before the timeout then exit that thread.

Thank you for your help in advance.

If the get_thumbnail function is yours, I'd build a timeout into it as suggested by @turbulencetoo. Otherwise, take a look at the signal module to add a timeout into process_image . As suggested in comments, you may also see further benefit in using multiprocessing versus threading.The interface to the multiprocessing module is almost identical to that of threading so it shouldn't be much work to switch.

As described in other questions , there is no official way to kill threads in Python. In cases where the thread is doing work that you control (rather than blocking eg on a network request), you can use signal variables to have the threads kill themselves, but this doesn't seem to be the case here.

For downloading multiple resources in parallel, you are probably going to want to use a library like pycurl that will use OS-specific features to allow multiple requests to execute asychronously on a single thread. This lets you use methods like set_timeout that provide a fairly clean way to deal with the issue you describe.

I've finally found a solution based on @turbulencetoo's comment.

get_thumbnail was not part of my code, but an external library's, so I couldn't set any type of timeout in my code. I thought this library didn't have a config item to set a timeout during the url request but apparently there is (I had already read about it and I misunderstood).

@RobertB Yes, join() has a timeout argument and I already tried setting that parameter but it didn't work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM