简体   繁体   中英

Multiprocessing or Threading a for loop in Python

I'm currently working on a project that I need help. I am working on some large graphs that I need to take some properties of them though out the years. I was thinking of using multiprocessing or threading package from Python. I have a for loop that goes though each year and produces a csv. I'm not sure how can I parallelize this, can you help me? Here is my code:

for year in tqdm(years):
    
    temp_df = df[df.label <= year]
    processed_df = id_df.copy()
    G = nx.DiGraph()
    G.add_edges_from(temp_df.iloc[:,:2].values.tolist())
    
    # Degree Centrality
    DegreeCentrality = degree_centrality(G)
    DegreeCentrality_df = pd.DataFrame(DegreeCentrality.items(), columns=['id', 'DegreeCentrality'])
    processed_df = pd.merge(processed_df, DegreeCentrality_df, how='left', on='id').fillna(0)
    
    del DegreeCentrality
    del DegreeCentrality_df
    gc.collect()
    
    # In Degree Centrality
    InDegreeCentrality = in_degree_centrality(G)
    InDegreeCentrality_df = pd.DataFrame(InDegreeCentrality.items(), columns=['id', 'InDegreeCentrality'])
    processed_df = pd.merge(processed_df, InDegreeCentrality_df, how='left', on='id').fillna(0)
    
    del InDegreeCentrality
    del InDegreeCentrality_df
    gc.collect()

   processed_df.to_csv('properties_{}'.format(year), index=False)

My guess is that I should make everything that is in the for loop as a function and call it for different threads. Any help would be appreciated, thank you!

You can add all the code inside the for loop to a function and call it using the multiprocessing library in python. Check it here: https://docs.python.org/3/library/multiprocessing.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM