简体   繁体   English

从字典列表中构建 Pandas.Dataframe 比循环更快的方法? [Python 3.9]

[英]A faster way of building a Pandas.Dataframe from list of dictionaries than loop? [Python 3.9]

I have a list of 5000 dictionaries where each dictionary has around 40 items, I have built a for loop that is extreeeemly slow - it needs a couple of minutes.我有一个包含 5000 个字典的列表,其中每个字典有大约 40 个项目,我构建了一个非常慢的for循环 - 它需要几分钟。

        # symbol_list_final is the list of dictionaries
        symbols_dataframe = pd.DataFrame([symbols_list_final[0]])

        for i in range(len(symbols_list_final) - 1):
             symbol_df_temp = pd.DataFrame([symbols_list_final[i + 1]])
             symbols_dataframe = pd.concat((symbols_dataframe, symbol_df_temp), axis=1)
             print(i)

Is there any way of doing it faster?有什么方法可以更快吗?

EDIT: It's way slower, My program is running rn, and it takes 1 seconds to make 4-5 iterations.编辑:慢得多,我的程序正在运行 rn,进行 4-5 次迭代需要 1 秒。

It seen like you are trying to formulate multiple dict dataframes and concatenate then into a single variable, containing your end_df.看起来您正在尝试制定多个 dict 数据帧,然后将其连接成一个变量,其中包含您的 end_df。 Firstly, the correct approach envolves not concatenating them all the time, only running such command once.首先,正确的方法是不要一直连接它们,只运行一次这样的命令。 So would recommend stacking the df objects on a list, and them concatenating所以建议将 df 对象堆叠在列表中,并将它们连接起来

list_of_dfs = []
for i in list_dict:
    list_of_dfs.append(pd.DataFrame(i))

So pd.concat(list_of_dfs ) would be wise than redefining your variable all the time in your loop所以pd.concat(list_of_dfs )比在循环中一直重新定义变量更明智

Now if creating the df object is taking a while (give us the time).现在,如果创建 df object 需要一段时间(给我们时间)。 Well there are other ways of approaching this issue, such as the library pyarrow (which can be faster depending on your cpu).好吧,还有其他方法可以解决此问题,例如库 pyarrow(根据您的 cpu 可能会更快)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM