简体   繁体   中英

What are the options to to compute statistical models on out of memory data sets in python?

I'm referring to sub hadoop size data, but bigger than ram.

Must these be coded by hand?

I'd try pytables, it's based on HDF5 and numpy, so you can use the same good statistical packages in Python, which are mostly based on numpy in some manner, while not having to put everything in memory

http://www.pytables.org/moin/MainFeatures

* Unlimited datasets size
Allows working with tables and/or arrays with a very large number of rows (up to 2**63), i.e. that don't fit in memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM