简体   繁体   中英

Python multithreading for different functions that return values to store in one list

I use a script to parce some sites and get news from there. Each function in this script parse one site and return list of articles and then I want to combine them all in one big list. If I parce site by site it takes to long and I desided to use multithreading. I found a sample like this one in the bottom, but it seems not pithonic for me. If I will add one more function to parse one more site, I will need to add each time the same block of code:

qN = Queue()
Thread(target=wrapper, args=(last_news_from_bar, qN)).start()
news_from_N = qN.get()
for new in news_from_N:
    news.append(new)

Is there another solution to do this kind of stuff?

#!/usr/bin/python
# -*- coding: utf-8 -*-
from queue import Queue
from threading import Thread


def wrapper(func, queue):
    queue.put(func())


def last_news_from_bar():
    ...
    return list_of_articles #[['title1', 'http://someurl1', '2017-09-13'],['title2', 'http://someurl2', '2017-09-13']]


def last_news_from_foo():
    ...
    return list_of_articles


q1, q2 = Queue(), Queue()

Thread(target=wrapper, args=(last_news_from_bar, q1)).start()
Thread(target=wrapper, args=(last_news_from_foo, q2)).start()

news_from_bar = q1.get()
news_from_foo = q2.get()

all_news = []

for new in news_from_bar:
    news.append(new)
for new in news_from_foo:
    news.append(new)

print(all_news)

All, that you should do, is using a single queue and extend your result array:

q1 = Queue()

Thread(target=wrapper, args=(last_news_from_bar, q1)).start()
Thread(target=wrapper, args=(last_news_from_foo, q1)).start()

all_news = []

all_news.extend(q1.get())
all_news.extend(q1.get())

print(all_news)

Solution without Queue :

NEWS = []
LOCK = Lock()

def gather_news(url):
    while True:
        news = news_from(url)
        if not news: break
        with LOCK:
            NEWS.append(news)

if __name__ == '__main__':
    T = []
     for url in ['url1', 'url2', 'url3']:
        t = Thread(target=gather_news, args=(url,))
        t.start()
        T.append(t)

# Wait until all Threads done
for t in T: 
    t.join()

print(NEWS)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM