Python 中的多處理和線程

Question

我正在嘗試在 python 中處理多處理，但是，我想我可能沒有正確理解它。

首先，我有 dataframe，它包含作為字符串的文本，我想在其上執行一些正則表達式。 代碼如下所示：

import multiprocess 
from threading import Thread

def clean_qa():
    for index, row in data.iterrows():
        data["qa"].loc[index] = re.sub("(\-{5,}).{1,100}(\-{5,})|(\[.{1,50}\])|[^\w\s]", "",  str(data["qa"].loc[index]))

if __name__ == '__main__':
    threads = []
    
    for i in range(os.cpu_count()):
        threads.append(Thread(target=test_qa))
        
    for thread in threads:
        thread.start()
        
    for thread in threads:
        thread.join()

if __name__ == '__main__':
    processes = []

    for i in range(os.cpu_count()):
        processes.append(multiprocess.Process(target=test_qa))
        
    for process in processes:
        process.start()
        
    for process in processes:
        process.join()

當我運行 function “clean_qa”而不是 function 時，只需執行 for 循環，一切正常，大約需要 3 分鍾。

但是，當我使用多處理或線程時，首先執行大約需要 10 分鍾，並且沒有清理文本，所以 dataframe 和以前一樣。

因此我的問題是，我做錯了什么，為什么需要更長的時間，為什么 dataframe 沒有發生任何事情？

非常感謝！

Answer 1

這有點離題（盡管我在原始帖子中的評論確實解決了實際問題），但是由於您使用的是 Pandas dataframe，因此您真的永遠不想手動遍歷它。

看起來你真正想要的只是：

r = re.compile(r"(\-{5,}).{1,100}(\-{5,})|(\[.{1,50}\])|[^\w\s]")

def clean_qa():
    data["qa"] = data["qa"].str.replace(r, "")

讓 Pandas 處理循環和並行化。

Answer 2

回答關於線程，在回答這個問題時，有一個 python 3.9 示例：

#example from the page below by Xiddoc
from threading import Thread
from time import sleep

# Here is a function that uses the sleep() function. If you called this directly, it would stop the main Python execution
def my_independent_function():
    print("Starting to sleep...")
    sleep(10)
    print("Finished sleeping.")

# Make a new thread which will run this function
t = Thread(target=my_independent_function)
# Start it in parallel
t.start()

# You can see that we can still execute other code, while other function is running
for i in range(5):
    print(i)
    sleep(1)

（取自這個問題：我可以在 python 中獨立於所有其他代碼運行協程嗎？）

而且您可能不應該嘗試同時使用線程和多處理。

如果您想閱讀有關 python 中的多處理\線程的更多一般信息，您可以查看這篇文章：如何在 Python 中使用線程？

Python 中的多處理和線程

問題描述

1 個解決方案

解決方案1
2 2022-01-14 12:00:43

解決方案2
0 2022-01-14 12:13:53

Python 中的多處理和線程

問題描述

1 個解決方案

解決方案1 2 2022-01-14 12:00:43

解決方案2 0 2022-01-14 12:13:53

解決方案1
2 2022-01-14 12:00:43

解決方案2
0 2022-01-14 12:13:53