简体   繁体   中英

Python Threads not finishing

I'm currently testing something with Threading/ workpool; I create 400 Threads which download a total of 5000 URLS... The problem is that some of the 400 threads are "freezing", when looking into my Processes I see that +- 15 threads in every run freeze, and after a time eventually close 1 by 1.

My question is if there is a way to have some sort of 'timer' / 'counter' that kills a thread if it isn't finished after x seconds.

# download2.py - Download many URLs using multiple threads.
import os
import urllib2
import workerpool
import datetime
from threading import Timer

class DownloadJob(workerpool.Job):
    "Job for downloading a given URL."
    def __init__(self, url):
        self.url = url # The url we'll need to download when the job runs
    def run(self):
            url = urllib2.urlopen(self.url).read()

# Initialize a pool, 400 threads in this case
pool = workerpool.WorkerPool(size=400)

# Loop over urls.txt and create a job to download the URL on each line
print datetime.datetime.now()
for url in open("urls.txt"):
    job = DownloadJob(url.strip())

# Send shutdown jobs to all threads, and wait until all the jobs have been completed
print datetime.datetime.now()

The problem is that some of the 400 threads are "freezing"...

That's most likely because of this line...

url = urllib2.urlopen(self.url).read()

By default, Python will wait forever for a remote server to respond, so if a one of your URLs points to a server which is ignoring the SYN packet, or is otherwise just really slow, the thread could potentially be blocked forever.

You can use the timeout parameter of urlopen() set a limit as to how long the thread will wait for the remote host to respond...

url = urllib2.urlopen(self.url, timeout=5).read() # Time out after 5 seconds

...or you can set it globally instead with socket.setdefaulttimeout() by putting these lines at the top of your code...

import socket
socket.setdefaulttimeout(5) # Time out after 5 seconds

urlopen accepts a timeout value, that would be the best way to handle it I think.

But I agree with the commenter that 400 threads is probably way too many

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM