在 Python 中更快地申请 function

Question

I am running the following code on about 6 million rows.我在大约 600 万行上运行以下代码。 It's so slow and never ends.它是如此缓慢，永远不会结束。

df['City'] = df['POSTAL_CODE'].apply(lambda x: nomi.query_postal_code(x).county_name)

It assigns a corresponding city to each postal code.它为每个邮政编码分配一个对应的城市。 When I run it on a slice of dateset(eg, 1000 rows) it works well.当我在一片日期集（例如，1000 行）上运行它时，它运行良好。 But running the code on the whole data never gives me any output.但是在整个数据上运行代码从来没有给我任何 output。

Can anyone modify the code to make it faster?任何人都可以修改代码以使其更快吗？

Thank you!谢谢！

Answer 1

!pip3 install multiprocess

from multiprocess import Pool

def parallelize_dataframe(data, func, n_cores=4):
       data_split = np.array_split(data, n_cores)
       pool = Pool(n_cores)
       data = pd.concat(pool.map(func, data_split))
       pool.close()
       pool.join()
       return data


df['City'] = parallelize_dataframe(df['POSTAL_CODE'], lambda x: nomi.query_postal_code(x).county_name, 4)

在 Python 中更快地申请 function

问题描述

1 个解决方案

解决方案1
0 2020-06-05 16:03:41

在 Python 中更快地申请 function

问题描述

1 个解决方案

解决方案1 0 2020-06-05 16:03:41

解决方案1
0 2020-06-05 16:03:41