Parallel running multiple data frames within a function

Question

Say I have a function,

def corr_array(data):
  
    return data

And the following data frames,

import numpy as np

np.random.seed([3,1415])
ind1 = ['A_PC','B_PC','C_PC','D_PC','E_PC','F_PC','N_PC','M_PC','O_PC','Q_PC']
col1 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df1  = pd.DataFrame(np.random.randint(10, size=(10, 7)), columns=col1,index=ind1)
ind2 = ['G_lncRNAs','I_lncRNAs','J_lncRNAs','K_lncRNAs','L_lncRNAs','M_lncRNAs','R_lncRNAs','N_lncRNAs']
col2 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df2  = pd.DataFrame(np.random.randint(20, size=(8, 7)), columns=col2,index=ind2)

I then split the above dataframes, because they are huge in original version,

# {"pc_1" : split_1, "pc_2" : split_2}
pc = {f"pc_{i + 1}": v for i, v in enumerate(np.array_split(df1, 2))}
lc = {f"lc_{i + 1}": v for i, v in enumerate(np.array_split(df2, 2))}

And call those split data frame in the following loop,

for pc_k, pc_v in pc.items():
    for lc_k, lc_v in lc.items():
        # (pc_1, lc_1), (pc_1, lc_2) ..
       # run the above function for each combination and save the results
        corr_array(pd.concat([pc_v, lc_v])). \
            to_csv(f"{pc_k}_{lc_k}.csv", sep="\t", index=False)

Here it is running one data frame after another from the lists pc and lc . Therefore, it is taking forever, to finish the jobs.

I would like to know if there is a way where I can run each combination of concatenated data frames in parallel? That would save time. Currently, the script is taking forever to finish the runs.

Appreciate any suggestions or help.

Answer 1

Looks like you can try to use multiprocessing function, try to look into the documentation https://docs.python.org/2/library/multiprocessing.html . I would recommend looking into this video also https://youtu.be/oEYDqQ1pq9o?list=PLQVvvaa0QuDfju7ADVp5W1GF9jVhjbX- _

Parallel running multiple data frames within a function

Question

1 answers

solution1
0 ACCPTED 2020-07-27 16:52:48

Parallel running multiple data frames within a function

Question

1 answers

solution1 0 ACCPTED 2020-07-27 16:52:48

solution1
0 ACCPTED 2020-07-27 16:52:48