What are the options to to compute statistical models on out of memory data sets in python?

Question

I'm referring to sub hadoop size data, but bigger than ram.

Must these be coded by hand?

Answer 1

I'd try pytables, it's based on HDF5 and numpy, so you can use the same good statistical packages in Python, which are mostly based on numpy in some manner, while not having to put everything in memory

http://www.pytables.org/moin/MainFeatures

* Unlimited datasets size
Allows working with tables and/or arrays with a very large number of rows (up to 2**63), i.e. that don't fit in memory.

What are the options to to compute statistical models on out of memory data sets in python?

Question

1 answers

solution1
0 2014-10-26 19:30:27

What are the options to to compute statistical models on out of memory data sets in python?

Question

1 answers

solution1 0 2014-10-26 19:30:27

solution1
0 2014-10-26 19:30:27