How to implement it using Multi threading in python?

Question

I am new to python and I want to understand how I can perform this operation using multi threading as the data is too large this operation is taking lot of time.

I have around 2500+ columns in spark dataframe df_my

d=[]
for x in df_my.columns:
  
    null_cnt= df_my.filter(df_my[x].isNotNull()).count()
    zero_cnt= df_my.filter(df_my[x]==0).count()
    fill_percent= str(((null_cnt)/total)*100)
    zero_percent= str(((zero_cnt)/null_cnt)*100)
    d.append({'Feature_name' : x,
              'Fillrate': fill_percent,
              'zero_percent':zero_percent })
    
final=spark.createDataFrame(d)
f_pandas=final.toPandas()
f_pandas.to_excel("output_pandas.xlsx")

Can anyone please help me in doing this using multi threading?

Answer 1

you can use .parallel_apply function from pandarallel to do multiprocessing. Take a look at this link

How to implement it using Multi threading in python?

Question

1 answers

solution1
0 2020-10-07 09:56:24

How to implement it using Multi threading in python?

Question

1 answers

solution1 0 2020-10-07 09:56:24

solution1
0 2020-10-07 09:56:24