[英]Create a numpy array from columns of a pandas dataframe
I have a DataFrame
that looks like this: 我有一个看起来像这样的
DataFrame
:
A B C
1 2 3
1 5 3
4 8 2
4 2 1
I would like to create a NumPy
array from this data using column A
as the index, column B
as the column headers and column C
as the fill data. 我想从此数据创建一个
NumPy
数组,使用A
列作为索引, B
列作为列标题, C
列作为填充数据。
Ultimately, it should look like this: 最终,它应该看起来像这样:
2 5 8
1 3 3
4 1 2
Is there a good way to do this? 有什么好方法吗?
I have tried df.pivot_table
, but I'm worried I have messed up the data, and I would rather do it in another, more intuitive way. 我已经尝试过
df.pivot_table
,但是担心我弄乱了数据,所以我宁愿以另一种更直观的方式进行操作。
manipulate the dataframe like this 像这样操作数据框
df.set_index(['A', 'B']).C.unstack()
Or 要么
df.set_index(['A', 'B']).C.unstack(fill_value='')
get the numpy array like this 得到这样的numpy数组
df.set_index(['A', 'B']).C.unstack().values
array([[ 3., 3., nan],
[ 1., nan, 2.]])
Or 要么
df.set_index(['A', 'B']).C.unstack(fill_value='').values
array([[3, 3, ''],
[1, '', 2]], dtype=object)
Pandas
unstack looked nice! Pandas
拆堆看起来不错! So, I thought let's try to replicate the same behavior with NumPy that could work on arrays and ended up something like this - 因此,我认为让我们尝试使用NumPy复制相同的行为,该行为可以在数组上工作并最终得到如下结果-
def numpy_unstack(a, fillval=0):
r = np.unique(a[:,0],return_inverse=1)[1]
c = np.unique(a[:,1],return_inverse=1)[1]
out = np.full((r.max()+1,c.max()+1),fillval)
out[r,c] = a[:,2]
return out
Sample run - 样品运行-
In [81]: df
Out[81]:
0 1 2
0 1 2 3
1 1 5 3
2 4 8 2
3 4 2 1
In [82]: numpy_unstack(df.values,0)
Out[82]:
array([[ 3., 3., 0.],
[ 1., 0., 2.]])
In [83]: numpy_unstack(df.values,np.nan)
Out[83]:
array([[ 3., 3., nan],
[ 1., nan, 2.]])
Like mentioned above, you can use pd.pivot_table
like 像上面提到的,你可以像这样使用
pd.pivot_table
In [1655]: df.pivot_table(index='A', columns='B', values='C', fill_value='')
Out[1655]:
B 2 5 8
A
1 3 3
4 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.