[英]Convert a numpy matrix to a set of pandas Series
Question: Is there a quick way to convert a 2D Numpy matrix to a set of Pandas Series?问:有没有一种快速的方法可以将二维 Numpy 矩阵转换为一组 Pandas 系列? For example, a (100 x5) ndarray, to 5 series with 100 rows each.例如,一个 (100 x5) ndarray,到 5 个系列,每个系列 100 行。
Background: I need to create a pandas dataframe using randomly generated data of different types (float, string, etc).背景:我需要使用随机生成的不同类型(浮点数、字符串等)的数据创建 pandas dataframe。 Currently, for float, I create a numpy matrix, for strings, I create an array of strings.目前,对于浮点数,我创建了一个 numpy 矩阵,对于字符串,我创建了一个字符串数组。 Then I combine all of these along axis=1 to form a dataframe.然后我将所有这些沿轴 = 1 组合在一起以形成 dataframe。 This does not preserve the datatypes of each individual column.这不会保留每个单独列的数据类型。
To preserve the datatype, I plan to use pandas series.为了保留数据类型,我计划使用 pandas 系列。 Since creating multiple series of floats will likely be slower than creating a numpy matrix of floats, I was wondering if there was a way to convert a numpy matrix to a set of series.由于创建多个浮点数系列可能比创建浮点数的 numpy 矩阵要慢,我想知道是否有办法将 numpy 矩阵转换为一组系列。
This question is different from mine in that it asks about converting a numpy matrix into a single series. 这个问题与我的不同,它询问将 numpy 矩阵转换为单个系列。 I require multiple series.我需要多个系列。
You can convert the matrix of each data type directly to a dataframe and then concatenate the resulting dataframes.您可以将每种数据类型的矩阵直接转换为 dataframe,然后连接生成的数据帧。
float_df = pd.DataFrame(np.random.rand(500).reshape((-1,5)))
# 0 1 2 3 4
#0 0.561765 0.177957 0.279419 0.332973 0.967186
#1 0.761327 0.323747 0.707742 0.555475 0.680662
#.. ... ... ... ... ...
#98 0.741207 0.061200 0.142316 0.381168 0.591554
#99 0.417697 0.723469 0.730677 0.538261 0.281296
#
#[100 rows x 5 columns]
pd.concat([float_df, int_df, ...], axis=1)
Making a dataframe from a dict of arrays:从 arrays 的字典中制作 dataframe:
In [571]: df = pd.DataFrame({'a':['one','two','three'], 'b':np.arange(3), 'c':np.ones(3)})
In [572]: df
Out[572]:
a b c
0 one 0 1.0
1 two 1 1.0
2 three 2 1.0
Note the mixed column dtypes:注意混合列 dtypes:
In [579]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null object
1 b 3 non-null int64
2 c 3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
If we ask for a numpy from that, we get a 2d object dtype array:如果我们从中请求 numpy,我们会得到一个 2d object dtype 数组:
In [580]: df.values
Out[580]:
array([['one', 0, 1.0],
['two', 1, 1.0],
['three', 2, 1.0]], dtype=object)
Recreating a dataframe, looks the same, but the column dtypes are different:重新创建 dataframe,看起来相同,但列 dtypes 不同:
In [581]: pd.DataFrame(df.values, columns=['a','b','c'])
Out[581]:
a b c
0 one 0 1.0
1 two 1 1.0
2 three 2 1.0
In [582]: _.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null object
1 b 3 non-null object
2 c 3 non-null object
dtypes: object(3)
memory usage: 200.0+ bytes
But a structured array does preserve column dtpes:但是结构化数组确实保留了列 dtpes:
In [587]: df.to_records(index=False)
Out[587]:
rec.array([('one', 0, 1.), ('two', 1, 1.), ('three', 2, 1.)],
dtype=[('a', 'O'), ('b', '<i8'), ('c', '<f8')])
In [588]: pd.DataFrame(_)
Out[588]:
a b c
0 one 0 1.0
1 two 1 1.0
2 three 2 1.0
In [589]: _.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 a 3 non-null object
1 b 3 non-null int64
2 c 3 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.