简体   繁体   English

有效地将大型Pandas数据帧写入磁盘

[英]Efficiently writing large Pandas data frames to disk

I am trying to find the best way to efficiently write large data frames (250MB+) to and from disk using Python/Pandas. 我试图找到使用Python / Pandas有效地在磁盘上写入大数据帧(250MB +)的最佳方法。 I've tried all of the methods in Python for Data Analysis , but the performance has been very disappointing. 我已经尝试了Python中用于数据分析的所有方法,但性能一直非常令人失望。

This is part of a larger project exploring migrating our current analytic/data management environment from Stata to Python. 这是探索将我们当前的分析/数据管理环境从Stata迁移到Python的大型项目的一部分。 When I compare the read/write times in my tests to those that I get with Stata, Python and Pandas are typically taking more than 20 times as long. 当我将测试中的读/写时间与Stata中的读/写时间进行比较时,Python和Pandas的使用时间通常超过20倍。

I strongly suspect that I am the problem, not Python or Pandas. 我强烈怀疑我是问题,而不是Python或Pandas。

Any suggestions? 有什么建议?

Using HDFStore is your best bet (not covered very much in the book, and has changed quite a lot). 使用HDFStore是你最好的选择(书中没有详细介绍,并且已经发生了很大变化)。 You will find performance is MUCH better than any other serialization method. 您会发现性能比任何其他序列化方法都要好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM