简体繁体中英

numpy.ndarray vs pandas.DataFrame

原文 2014-08-08 10:15:18 2 1 python/ python-3.x/ numpy/ pandas

I need to make a strategic decision about choice of the basis for data structure holding statistical data frames in my program.

I store hundreds of thousands of records in one big table. Each field would be of a different type, including short strings. I'd perform multiple regression analysis and manipulations on the data that need to be done quick, in real time. I also need to use something, that is relatively popular and well supported.

I know about the following contestants:

list of `array.array`

That is the most basic thing to do. Unfortunately it doesn't support strings. And I need to use numpy anyway for its statistical part, so this one is out of question.

`numpy.ndarray`

The ndarray has ability to hold arrays of different types in each column (eg np.dtype([('name', np.str_, 16), ('grades', np.float64, (2,))]) ). It seems a natural winner, but...

`pandas.DataFrame`

This one is built with statistical use in mind, but is it efficient enough?

I read, that the pandas.DataFrame is no longer based on the numpy.ndarray (although it shares the same interface). Can anyone shed some light on it? Or maybe there is an even better data structure out there?

1 answers

pandas.DataFrame is awesome, and interacts very well with much of numpy. Much of the DataFrame is written in Cython and is quite optimized. I suspect the ease of use and the richness of the Pandas API will greatly outweigh any potential benefit you could obtain by rolling your own interfaces around numpy.

How does pandas.DataFrame got converted to numpy.ndarray

Multidimensional numpy.ndarray from multi-indexed pandas.DataFrame

Using numpy.ndarray vs. Pandas Dataframe in sklearn's .fit() method

Creating a pandas dataframe from numpy.ndarray inside a for loop

How to store numpy.ndarray in the columns of DataFrame

numpy.ndarray' object is not callable - Using Pandas

pandas numpy.ndarray object has no attribute

Pandas unhashable type: 'numpy.ndarray'

numpy.ndarray to dataframe conversion - dimension issues

'numpy.ndarray' object has no attribute 'DataFrame'

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How does pandas.DataFrame got converted to numpy.ndarray Multidimensional numpy.ndarray from multi-indexed pandas.DataFrame Using numpy.ndarray vs. Pandas Dataframe in sklearn's .fit() method Creating a pandas dataframe from numpy.ndarray inside a for loop How to store numpy.ndarray in the columns of DataFrame numpy.ndarray' object is not callable - Using Pandas pandas numpy.ndarray object has no attribute Pandas unhashable type: 'numpy.ndarray' numpy.ndarray to dataframe conversion - dimension issues 'numpy.ndarray' object has no attribute 'DataFrame'

Related Tags

numpy.ndarray vs pandas.DataFrame

Question

list of array.array

numpy.ndarray

pandas.DataFrame

1 answers

solution1 19 ACCPTED 2014-08-08 20:45:00

list of `array.array`

`numpy.ndarray`

`pandas.DataFrame`

solution1
19 ACCPTED 2014-08-08 20:45:00