将一个 numpy 矩阵转换为一组 pandas 系列

Question

问：有没有一种快速的方法可以将二维 Numpy 矩阵转换为一组 Pandas 系列？ 例如，一个 (100 x5) ndarray，到 5 个系列，每个系列 100 行。

背景：我需要使用随机生成的不同类型（浮点数、字符串等）的数据创建 pandas dataframe。 目前，对于浮点数，我创建了一个 numpy 矩阵，对于字符串，我创建了一个字符串数组。 然后我将所有这些沿轴 = 1 组合在一起以形成 dataframe。 这不会保留每个单独列的数据类型。

为了保留数据类型，我计划使用 pandas 系列。 由于创建多个浮点数系列可能比创建浮点数的 numpy 矩阵要慢，我想知道是否有办法将 numpy 矩阵转换为一组系列。

这个问题与我的不同，它询问将 numpy 矩阵转换为单个系列。 我需要多个系列。

Answer 1

您可以将每种数据类型的矩阵直接转换为 dataframe，然后连接生成的数据帧。

float_df = pd.DataFrame(np.random.rand(500).reshape((-1,5)))
#           0         1         2         3         4
#0   0.561765  0.177957  0.279419  0.332973  0.967186
#1   0.761327  0.323747  0.707742  0.555475  0.680662
#..       ...       ...       ...       ...       ...
#98  0.741207  0.061200  0.142316  0.381168  0.591554
#99  0.417697  0.723469  0.730677  0.538261  0.281296
#
#[100 rows x 5 columns]

pd.concat([float_df, int_df, ...], axis=1)

Answer 2

从 arrays 的字典中制作 dataframe：

In [571]: df = pd.DataFrame({'a':['one','two','three'], 'b':np.arange(3), 'c':np.ones(3)})
In [572]: df
Out[572]: 
       a  b    c
0    one  0  1.0
1    two  1  1.0
2  three  2  1.0

注意混合列 dtypes：

In [579]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   a       3 non-null      object 
 1   b       3 non-null      int64  
 2   c       3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

如果我们从中请求 numpy，我们会得到一个 2d object dtype 数组：

In [580]: df.values
Out[580]: 
array([['one', 0, 1.0],
       ['two', 1, 1.0],
       ['three', 2, 1.0]], dtype=object)

重新创建 dataframe，看起来相同，但列 dtypes 不同：

In [581]: pd.DataFrame(df.values, columns=['a','b','c'])
Out[581]: 
       a  b    c
0    one  0  1.0
1    two  1  1.0
2  three  2  1.0
In [582]: _.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   a       3 non-null      object
 1   b       3 non-null      object
 2   c       3 non-null      object
dtypes: object(3)
memory usage: 200.0+ bytes

但是结构化数组确实保留了列 dtpes：

In [587]: df.to_records(index=False)
Out[587]: 
rec.array([('one', 0, 1.), ('two', 1, 1.), ('three', 2, 1.)],
          dtype=[('a', 'O'), ('b', '<i8'), ('c', '<f8')])
In [588]: pd.DataFrame(_)
Out[588]: 
       a  b    c
0    one  0  1.0
1    two  1  1.0
2  three  2  1.0
In [589]: _.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   a       3 non-null      object 
 1   b       3 non-null      int64  
 2   c       3 non-null      float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

将一个 numpy 矩阵转换为一组 pandas 系列

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-04-29 04:37:43

解决方案2
0 2021-04-29 06:34:37

将一个 numpy 矩阵转换为一组 pandas 系列

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-04-29 04:37:43

解决方案2 0 2021-04-29 06:34:37

解决方案1
0 已采纳 2021-04-29 04:37:43

解决方案2
0 2021-04-29 06:34:37