简体   繁体   中英

How to implement it using Multi threading in python?

I am new to python and I want to understand how I can perform this operation using multi threading as the data is too large this operation is taking lot of time.

I have around 2500+ columns in spark dataframe df_my

d=[]
for x in df_my.columns:
  
    null_cnt= df_my.filter(df_my[x].isNotNull()).count()
    zero_cnt= df_my.filter(df_my[x]==0).count()
    fill_percent= str(((null_cnt)/total)*100)
    zero_percent= str(((zero_cnt)/null_cnt)*100)
    d.append({'Feature_name' : x,
              'Fillrate': fill_percent,
              'zero_percent':zero_percent })
    
final=spark.createDataFrame(d)
f_pandas=final.toPandas()
f_pandas.to_excel("output_pandas.xlsx") 

Can anyone please help me in doing this using multi threading?

you can use .parallel_apply function from pandarallel to do multiprocessing. Take a look at this link

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM