简体   繁体   中英

Python multithreading threads not waiting for .join() before continuing

I have been experimenting with multithreading using the threading library and creating a different thread for several different functions. The functions take in a pandas dataframe as the argument and run an SQL query to AWS Redshift and add the retrieved data as a column to the dataframe. However, I have an issue where sometimes one of the columns will be empty when printing the dataframe after the threads have finished. This is seemingly random and sometimes all of the columns are added without any issues. I thought the purpose of .join() was to prevent this by waiting until each thread had been finished before continuing, but this does not seem to be the case.

import pandas as pd
import threading

df = pd.DataFrame()

def redshift_query1(df):
    run query
    df[column_name1] = query_results

def redshift_query2(df):
    run query
    df[column_name2] = query_results

def redshift_query3(df):
    run query
    df[column_name3] = query_results

t1 = threading.Thread(target=redshift_query1, args = [df])
t2 = threading.Thread(target=redshift_query2, args = [df])
t3 = threading.Thread(target=redshift_query3, args = [df])

t1.start()
t2.start()
t3.start()

t1.join()
t2.join()
t3.join()

print(df)

pandas is not thread safe. For more information, see . However, builtin types are thread safe in Python. So you can hold the result in a dict then create a DataFrame.

import pandas as pd
import threading

result = {}

def redshift_query1(df):
    result["column_name1"] = [3]

def redshift_query2(df):
     result["column_name2"] = [2]

def redshift_query3(df):
    result["column_name3"] = [1]

t1 = threading.Thread(target=redshift_query1, args = [df])
t2 = threading.Thread(target=redshift_query2, args = [df])
t3 = threading.Thread(target=redshift_query3, args = [df])

t1.start()
t2.start()
t3.start()

t1.join()
t2.join()
t3.join()

df = pd.DataFrame(result)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM