简体   繁体   English

从熊猫数据框的列创建一个numpy数组

[英]Create a numpy array from columns of a pandas dataframe

I have a DataFrame that looks like this: 我有一个看起来像这样的DataFrame

A    B    C
1    2    3
1    5    3
4    8    2
4    2    1

I would like to create a NumPy array from this data using column A as the index, column B as the column headers and column C as the fill data. 我想从此数据创建一个NumPy数组,使用A列作为索引, B列作为列标题, C列作为填充数据。

Ultimately, it should look like this: 最终,它应该看起来像这样:

     2    5    8
1    3    3    
4    1         2

Is there a good way to do this? 有什么好方法吗?

I have tried df.pivot_table , but I'm worried I have messed up the data, and I would rather do it in another, more intuitive way. 我已经尝试过df.pivot_table ,但是担心我弄乱了数据,所以我宁愿以另一种更直观的方式进行操作。

manipulate the dataframe like this 像这样操作数据框

df.set_index(['A', 'B']).C.unstack()

在此处输入图片说明

Or 要么

df.set_index(['A', 'B']).C.unstack(fill_value='')

在此处输入图片说明


get the numpy array like this 得到这样的numpy数组

df.set_index(['A', 'B']).C.unstack().values

array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

Or 要么

df.set_index(['A', 'B']).C.unstack(fill_value='').values

array([[3, 3, ''],
       [1, '', 2]], dtype=object)

Pandas unstack looked nice! Pandas拆堆看起来不错! So, I thought let's try to replicate the same behavior with NumPy that could work on arrays and ended up something like this - 因此,我认为让我们尝试使用NumPy复制相同的行为,该行为可以在数组上工作并最终得到如下结果-

def numpy_unstack(a, fillval=0):
    r = np.unique(a[:,0],return_inverse=1)[1]
    c = np.unique(a[:,1],return_inverse=1)[1]
    out = np.full((r.max()+1,c.max()+1),fillval)
    out[r,c] = a[:,2]
    return out

Sample run - 样品运行-

In [81]: df
Out[81]: 
   0  1  2
0  1  2  3
1  1  5  3
2  4  8  2
3  4  2  1

In [82]: numpy_unstack(df.values,0)
Out[82]: 
array([[ 3.,  3.,  0.],
       [ 1.,  0.,  2.]])

In [83]: numpy_unstack(df.values,np.nan)
Out[83]: 
array([[  3.,   3.,  nan],
       [  1.,  nan,   2.]])

Like mentioned above, you can use pd.pivot_table like 像上面提到的,你可以像这样使用pd.pivot_table

In [1655]: df.pivot_table(index='A', columns='B', values='C', fill_value='')
Out[1655]:
B  2  5  8
A
1  3  3
4  1     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM