[英]Convert pandas dataframe to array of series
The now-deprecated to_matrix
and values would provide arrays from a dataframe . 现在不推荐使用
to_matrix
和值将提供从数据帧阵列。 However I want to work with the "features" of a dataframe - which means working with the columns as Series . 但是,我想使用数据框的“功能”-这意味着将列作为Series使用 。 How can a list of Series be extracted from the dataframe ?
如何从数据框中提取系列列表?
I think you just need transpose the return from .values
我认为您只需要转置
.values
df.values.T.tolist()
Out[1321]:
[['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']]
Or just 要不就
df.values.T
Out[1322]:
array([['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
['c1', 'c1', 'c1', 'c2', 'c2', 'c2', 'c3', 'c4', 'c5', 'c5']],
dtype=object)
If need list of Series
we can also do groupby
如果需要
Series
清单,我们也可以进行groupby
[y for _,y in df.groupby(level=0,axis=1)]
Out[1328]:
[ airport
0 a1
1 a3
2 a1
3 a1
4 a2
5 a2
6 a3
7 a4
8 a4
9 a1, carrier
0 c1
1 c1
2 c1
3 c2
4 c2
5 c2
6 c3
7 c4
8 c5
9 c5]
Data input 数据输入
df
Out[1329]:
airport carrier
0 a1 c1
1 a3 c1
2 a1 c1
3 a1 c2
4 a2 c2
5 a2 c2
6 a3 c3
7 a4 c4
8 a4 c5
9 a1 c5
You could do this with a list comprehension: 您可以通过列表理解来做到这一点:
import pandas as pd
df = pd.DataFrame(some_data)
mat = [df[col].values for col in df.columns]
Where df[col].values
returns a Series
of the values from a given column 凡
df[col].values
返回一个Series
,从给定列的值
Can get a list of Series with .to_dict('Series')
, just taking the values. 只需获取值,即可使用
.to_dict('Series')
获得Series的列表。
list(df.to_dict('Series').values())
[0 a1
1 a3
2 a1
3 a1
4 a2
5 a2
6 a3
7 a4
8 a4
9 a1
Name: airport, dtype: object, 0 c1
1 c1
2 c1
3 c2
4 c2
5 c2
6 c3
7 c4
8 c5
9 c5
Name: carrier, dtype: object]
Each element of the list is a Series: 列表中的每个元素都是一个Series:
type(list(df.to_dict('Series').values())[0])
#pandas.core.series.Series
You can track much of the same information (different dtypes between Series, names of Series) in a numpy
structured array that you can in a DataFrame. 您可以在
numpy
结构化数组中跟踪与DataFrame中相同的信息(系列之间不同的dtype,系列名称)。 Pandas has a convenient way of doing this. 熊猫有一个方便的方法可以做到这一点。 I am using @Wen's sample data.
我正在使用@Wen的示例数据。
u = df.to_records(index=False)
rec.array([('a1', 'c1'), ('a3', 'c1'), ('a1', 'c1'), ('a1', 'c2'),
('a2', 'c2'), ('a2', 'c2'), ('a3', 'c3'), ('a4', 'c4'),
('a4', 'c5'), ('a1', 'c5')],
dtype=[('airport', 'O'), ('carrier', 'O')])
u['airport']
array(['a1', 'a3', 'a1', 'a1', 'a2', 'a2', 'a3', 'a4', 'a4', 'a1'],
dtype=object)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.