简体   繁体   English

Pandas append 非常慢,使用 from_dict 时出现问题

[英]Pandas append is very slow, having problems using from_dict

I need to append pandas dataframes.我需要附加熊猫数据帧。 But I found out that append is very slow.但我发现 append 非常慢。 This link Improve Row Append Performance On Pandas DataFrames suggests using from_dict instead of append.此链接提高 Pandas DataFrames 的行追加性能建议使用 from_dict 而不是追加。 I tried to do the same but I am having a problem, converting a dataframe to_dict and back from dict to dataframe.我试图做同样的事情,但我遇到了问题,将数据帧转换为_dict 并从 dict 转换回数据帧。

I have this dataframe df我有这个数据框 df

        date   open   high    low  close  volume  average  barCount simb
0  3/31/2020  81.43  81.49  78.56  78.91  183417  80.0940     86742  xdt
1   4/1/2020  77.00  77.38  75.35  76.57   91420  76.4395     49399  xdt
2   4/2/2020  76.12  79.66  76.00  79.44   75298  78.4080     40614  xdt
3   4/3/2020  78.79  79.99  78.18  79.45   64965  79.0490     37140  xdt
4   4/6/2020  81.08  83.12  79.60  82.73   89395  81.3605     46247  xdt
5   4/7/2020  83.45  84.48  81.76  81.93   77722  83.3980     43947  xdt
6   4/8/2020  82.50  85.39  81.05  84.95   66202  83.4955     40256  xdt
7   4/9/2020  85.00  86.50  82.95  86.04   80298  85.1100     46184  xdt
8  4/13/2020  86.32  86.48  83.52  85.85   48114  85.1790     27280  xdt
9  4/14/2020  87.00  89.54  86.50  89.14   75528  88.4410     42810  xdt

I have several thousands of dataframes like this.我有几千个这样的数据帧。 I need to convert them to dict and then all of them to back to dataframe as it is shown on the link.我需要将它们转换为 dict,然后将它们全部转换为链接上显示的数据帧。

My code我的代码

d = {} 
i=0
d[i] =df.to_dict( 'index')
pd.DataFrame.from_dict(d, 'index')

I cannot get a proper dataframe with this code.我无法使用此代码获得正确的数据框。 I used different options instead of 'index' option, but it did not help.我使用了不同的选项而不是 'index' 选项,但它没有帮助。 I would appreciate it if someone could help me with the code如果有人可以帮助我处理代码,我将不胜感激

Read all the dataframes into a list将所有数据帧读入列表

from glob import glob

dfs = [pd.read_csv(file) for file in glob('*.csv')]

Then use pd.concat然后使用pd.concat

big_df = pd.concat(dfs)

NOTE: glob('*.csv') reads all csv files in the current working directory.注意: glob('*.csv')读取当前工作目录中的所有csv文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM