Say I have a function,
def corr_array(data):
return data
And the following data frames,
import numpy as np
np.random.seed([3,1415])
ind1 = ['A_PC','B_PC','C_PC','D_PC','E_PC','F_PC','N_PC','M_PC','O_PC','Q_PC']
col1 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df1 = pd.DataFrame(np.random.randint(10, size=(10, 7)), columns=col1,index=ind1)
ind2 = ['G_lncRNAs','I_lncRNAs','J_lncRNAs','K_lncRNAs','L_lncRNAs','M_lncRNAs','R_lncRNAs','N_lncRNAs']
col2 = ['sap1','luf','tur','sul','sul2','bmw','aud']
df2 = pd.DataFrame(np.random.randint(20, size=(8, 7)), columns=col2,index=ind2)
I then split the above dataframes, because they are huge in original version,
# {"pc_1" : split_1, "pc_2" : split_2}
pc = {f"pc_{i + 1}": v for i, v in enumerate(np.array_split(df1, 2))}
lc = {f"lc_{i + 1}": v for i, v in enumerate(np.array_split(df2, 2))}
And call those split data frame in the following loop,
for pc_k, pc_v in pc.items():
for lc_k, lc_v in lc.items():
# (pc_1, lc_1), (pc_1, lc_2) ..
# run the above function for each combination and save the results
corr_array(pd.concat([pc_v, lc_v])). \
to_csv(f"{pc_k}_{lc_k}.csv", sep="\t", index=False)
Here it is running one data frame after another from the lists pc
and lc
. Therefore, it is taking forever, to finish the jobs.
I would like to know if there is a way where I can run each combination of concatenated data frames in parallel? That would save time. Currently, the script is taking forever to finish the runs.
Appreciate any suggestions or help.
Looks like you can try to use multiprocessing function, try to look into the documentation https://docs.python.org/2/library/multiprocessing.html . I would recommend looking into this video also https://youtu.be/oEYDqQ1pq9o?list=PLQVvvaa0QuDfju7ADVp5W1GF9jVhjbX- _
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.