[英]Multiprocessing or Threading a for loop in Python
I'm currently working on a project that I need help.我目前正在做一个需要帮助的项目。 I am working on some large graphs that I need to take some properties of them though out the years.
我正在研究一些大型图表,尽管多年来我需要获取它们的一些属性。 I was thinking of using multiprocessing or threading package from Python.
我正在考虑使用 Python 中的多处理或线程 package。 I have a for loop that goes though each year and produces a csv.
我有一个 for 循环,每年都会进行并产生一个 csv。 I'm not sure how can I parallelize this, can you help me?
我不确定如何并行化这个,你能帮我吗? Here is my code:
这是我的代码:
for year in tqdm(years):
temp_df = df[df.label <= year]
processed_df = id_df.copy()
G = nx.DiGraph()
G.add_edges_from(temp_df.iloc[:,:2].values.tolist())
# Degree Centrality
DegreeCentrality = degree_centrality(G)
DegreeCentrality_df = pd.DataFrame(DegreeCentrality.items(), columns=['id', 'DegreeCentrality'])
processed_df = pd.merge(processed_df, DegreeCentrality_df, how='left', on='id').fillna(0)
del DegreeCentrality
del DegreeCentrality_df
gc.collect()
# In Degree Centrality
InDegreeCentrality = in_degree_centrality(G)
InDegreeCentrality_df = pd.DataFrame(InDegreeCentrality.items(), columns=['id', 'InDegreeCentrality'])
processed_df = pd.merge(processed_df, InDegreeCentrality_df, how='left', on='id').fillna(0)
del InDegreeCentrality
del InDegreeCentrality_df
gc.collect()
processed_df.to_csv('properties_{}'.format(year), index=False)
My guess is that I should make everything that is in the for loop as a function and call it for different threads.我的猜测是,我应该将 for 循环中的所有内容作为 function 并为不同的线程调用它。 Any help would be appreciated, thank you!
任何帮助将不胜感激,谢谢!
You can add all the code inside the for loop to a function and call it using the multiprocessing library in python.您可以将 for 循环中的所有代码添加到 function 并使用 python 中的多处理库调用它。 Check it here: https://docs.python.org/3/library/multiprocessing.html
在这里查看: https://docs.python.org/3/library/multiprocessing.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.