简体   繁体   中英

Parallel running multiple data frames within a function

Say I have a function,

def corr_array(data):
  
    return data

And the following data frames,

import numpy as np

np.random.seed([3,1415])
ind1 = ['A_PC','B_PC','C_PC','D_PC','E_PC','F_PC','N_PC','M_PC','O_PC','Q_PC']
col1 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df1  = pd.DataFrame(np.random.randint(10, size=(10, 7)), columns=col1,index=ind1)
ind2 = ['G_lncRNAs','I_lncRNAs','J_lncRNAs','K_lncRNAs','L_lncRNAs','M_lncRNAs','R_lncRNAs','N_lncRNAs']
col2 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df2  = pd.DataFrame(np.random.randint(20, size=(8, 7)), columns=col2,index=ind2)

I then split the above dataframes, because they are huge in original version,

# {"pc_1" : split_1, "pc_2" : split_2}
pc = {f"pc_{i + 1}": v for i, v in enumerate(np.array_split(df1, 2))}
lc = {f"lc_{i + 1}": v for i, v in enumerate(np.array_split(df2, 2))}

And call those split data frame in the following loop,

for pc_k, pc_v in pc.items():
    for lc_k, lc_v in lc.items():
        # (pc_1, lc_1), (pc_1, lc_2) ..
       # run the above function for each combination and save the results
        corr_array(pd.concat([pc_v, lc_v])). \
            to_csv(f"{pc_k}_{lc_k}.csv", sep="\t", index=False)

Here it is running one data frame after another from the lists pc and lc . Therefore, it is taking forever, to finish the jobs.

I would like to know if there is a way where I can run each combination of concatenated data frames in parallel? That would save time. Currently, the script is taking forever to finish the runs.

Appreciate any suggestions or help.

Looks like you can try to use multiprocessing function, try to look into the documentation https://docs.python.org/2/library/multiprocessing.html . I would recommend looking into this video also https://youtu.be/oEYDqQ1pq9o?list=PLQVvvaa0QuDfju7ADVp5W1GF9jVhjbX- _

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM