I'm referring to sub hadoop size data, but bigger than ram.
Must these be coded by hand?
I'd try pytables, it's based on HDF5 and numpy, so you can use the same good statistical packages in Python, which are mostly based on numpy in some manner, while not having to put everything in memory
http://www.pytables.org/moin/MainFeatures
* Unlimited datasets size
Allows working with tables and/or arrays with a very large number of rows (up to 2**63), i.e. that don't fit in memory.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.