简体   繁体   English

Python线程 - 意外输出

[英]Python threading - unexpected output

I am new to Python, and have written a threaded script below, which takes each line of a file, and passes it to the get_result function. 我是Python的新手,并在下面编写了一个线程脚本,它接受文件的每一行,并将其传递给get_result函数。 The get_result function should output the url and status code if it is a 200 or 301. get_result函数应输出url和status代码(如果它是200或301)。

The code is as follows: 代码如下:

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    t.start()

end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

I used Queue and threading, but what is happening is that the print of "execution time was x " is being output before the threads finish outputting data. 我使用了Queue和线程,但发生的事情是在线程完成输出数据之前输出“执行时间为x”的打印。

Ie typical output is: 即典型的输出是:

200 www.domain.com/ok-url
200 www.domain.com/ok-url-1
200 www.domain.com/ok-url-2
execution time was 3
200 www.domain.com/ok-url-4
200 www.domain.com/ok-ur-5
200 www.domain.com/ok-url-6

How is this happening, and I would like to know how can I have the script execution show at the end of the script, ie once all urls have been processed and output? 这是怎么回事,我想知道如何在脚本结束时显示脚本执行,即一旦所有网址都被处理和输出?

Thanks to the answer given below by utdemir, here's the updated code with join. 感谢utdemir给出的答案,这里是加入的更新代码。

import requests
import Queue
import threading
import re
import time

start_time = int(time.time())
regex_to_use = re.compile(r"^")


def get_result(q, partial_url):
    partial_url = regex_to_use.sub("%s" % "http://www.domain.com/", partial_url)
    r = requests.get(partial_url)
    status = r.status_code
    #result = "nothing"
    if status == 200 or status == 301:
        result = str(status) + " " + partial_url
        print(result)


#need list of urls from file
file_list = [line.strip() for line in open('/home/shares/inbound/seo/feb-404s/list.csv', 'r')]
q = Queue.Queue()
threads_list = []

for url in file_list:
    #for each partial. send to the processing function get_result
    t = threading.Thread(target=get_result, args=(q, url))
    threads_list.append(t)
    t.start()

for thread in threads_list:
    thread.join()


end_time = int(time.time())
exec_time = end_time - start_time
print("execution time was " + str(exec_time))

You should join threads to wait for them, or they will continue executing in background. 你应该加入线程来等待它们,否则它们将继续在后台执行。

Like this: 像这样:

threads = []
for url in file_list:
    ...
    threads.append(t)

for thread in threads:
    thread.join() # Wait until each thread terminates

end_time = int(time.time()
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM