简体   繁体   English

Python 线程无法正确执行

[英]Python threading does not execute properly

I have a thread_function(ticker) which basically takes in a stock symbol as a string, checks if it meets the condition and if it does, appends it to a list.我有一个thread_function(ticker) ,它基本上将股票代码作为字符串接收,检查它是否满足条件,如果满足,则将其附加到列表中。 capitulation(ticker, df) function returns either stock symbol or nothing at all. capitulation(ticker, df) function 返回股票代码或不返回任何内容。 As I loop through 5000+ tickers and pull data for them, I have implemented a threading.当我遍历 5000 多个代码并为它们提取数据时,我已经实现了一个线程。 Without threading implemented, this code takes at least half an hour to finish, but it actually works and I get data at the end in the results list.如果没有实现线程,这段代码至少需要半个小时才能完成,但它确实有效,我在结果列表的最后得到了数据。 However, with threading it finished in less than a second, but the results list is empty at the end.但是,使用线程它在不到一秒的时间内完成,但结果列表最后是空的。 For some reason when I put the breakpoints on capitulation() function it never stops, but it goes in the pull_data() function which basically downloads the data from Yahoo Finance.出于某种原因,当我将断点放在 capitulation() function 上时,它永远不会停止,但它进入了pull_data() function,它基本上从 Yahoo Finance 下载数据。 Below is the code:下面是代码:

tickers = pd.read_csv("./text_files/stock_list.csv")


def thread_function(ticker):
    try:
        df = pull_data(ticker)
        if not df.empty:
            if capitulation(ticker, df):
                results.append(ticker)
    except:
        pass

    with print_lock:
        print(threading.current_thread().name, ticker)


def threader():
    while True:
        worker = q.get()
        thread_function(worker)
        q.task_done()


print_lock = threading.Lock()

q = Queue()


# how many threads are we going to allow
for x in range(10):
    t = threading.Thread(target=threader)
    t.daemon = True
    t.start()


start = time.time()


for ticker in tickers.yahoo_symbol:
    q.put(lambda: thread_function(ticker))


q.join()

print('Entire job took:', time.time()-start)`

EDIT:编辑:

I have also tried with multiprocessing Pool and apply_async function as per code below, but it still does not return a list that is returned by running normally:我也尝试过 multiprocessing Pool 和 apply_async function 按照下面的代码,但它仍然没有返回正常运行返回的列表:

def log_result(result):
if result is not None:
    results.append(result)

pool = Pool(25)
start = time.time()
for x in tickers.yahoo_symbol:
    pool.apply_async(thread_function, args=(x,), callback=log_result)
pool.close()
pool.join()

print(results)
print('Entire job took:', time.time() - start)

thread_function() is in this case moved to another file since multiprocessing throws AttributeError.在这种情况下,thread_function() 被移动到另一个文件,因为多处理会引发 AttributeError。

When using threading or multiprocessing your functions are going to have their own copy of these variables inside of the thread and will not update the variable in your main script.当使用线程或多处理时,您的函数将在线程内拥有这些变量的自己的副本,并且不会更新主脚本中的变量。 This is why that variable is empty.这就是该变量为空的原因。

You should look at the Multiprocessing library, specifically the Pool and apply_async functions.您应该查看 Multiprocessing 库,特别是 Pool 和 apply_async 函数。 These tools allow you to return results from the other threads back to the main thread.这些工具允许您将其他线程的结果返回到主线程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM