[英]Put a 2d Array into a Pandas Series
我有一个2D Numpy数组,我想放入一个pandas系列(不是DataFrame):
>>> import pandas as pd
>>> import numpy as np
>>> a = np.zeros((5, 2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
但这会引发错误:
>>> s = pd.Series(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 227, in __init__
raise_cast_failure=True)
File "/miniconda/envs/pyspark/lib/python3.4/site-packages/pandas/core/series.py", line 2920, in _sanitize_array
raise Exception('Data must be 1-dimensional')
Exception: Data must be 1-dimensional
有可能是黑客:
>>> s = pd.Series(map(lambda x:[x], a)).apply(lambda x:x[0])
>>> s
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
有没有更好的办法?
好吧,你可以使用numpy.ndarray.tolist
函数,如下所示:
>>> a = np.zeros((5,2))
>>> a
array([[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.],
[ 0., 0.]])
>>> a.tolist()
[[0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
>>> pd.Series(a.tolist())
0 [0.0, 0.0]
1 [0.0, 0.0]
2 [0.0, 0.0]
3 [0.0, 0.0]
4 [0.0, 0.0]
dtype: object
编辑:
完成类似结果的更快方法是简单地执行pd.Series(list(a))
。 这将生成一系列numpy数组而不是Python列表,因此应该比返回Python列表列表的a.tolist
更快。
pd.Series(list(a))
总是慢于
pd.Series(a.tolist())
测试了20,000,000 - 500,000行
a = np.ones((500000,2))
仅显示1,000,000行:
%timeit pd.Series(list(a))
1 loop, best of 3: 301 ms per loop
%timeit pd.Series(a.tolist())
1 loop, best of 3: 261 ms per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.