简体   繁体   English

从numpy数组创建Pandas数据帧,并使用数组的第一列作为索引

[英]Create Pandas dataframe from numpy array and use first column of the array as index

I have a numpy array (a): 我有一个numpy数组(a):

array([[ 1. ,  5.1,  3.5,  1.4,  0.2],
[ 1. ,  4.9,  3. ,  1.4,  0.2],
[ 2. ,  4.7,  3.2,  1.3,  0.2],
[ 2. ,  4.6,  3.1,  1.5,  0.2]])

I would like to make a pandas dataframe (pd) with values=a, columns= A,B,C,D and index= to the first column of my numpy array, finally it should looks like this: 我想在我的numpy数组的第一列创建一个prandas dataframe(pd),其中values = a,columns = A,B,C,D和index =,最后它应该如下所示:

       A    B    C    D
  1  5.1  3.5  1.4  0.2
  1  4.9  3.0  1.4  0.2
  2  4.7  3.2  1.3  0.2
  2  4.6  3.1  1.5  0.2

I am trying this: 我在尝试这个:

    df = pd.DataFrame(a, index=a[:,0], columns=['A', 'B','C','D'])

and I get the following error: 我收到以下错误:

ValueError: Shape of passed values is (5, 4), indices imply (4, 4)

Any help? 有帮助吗? Thanks 谢谢

You passed the complete array as the data param, you need to slice your array also if you want just 4 columns from the array as the data: 您将完整数组作为data参数传递,如果只需要数组中的4列作为data ,则还需要对数组进行切片:

In [158]:
df = pd.DataFrame(a[:,1:], index=a[:,0], columns=['A', 'B','C','D'])
df

Out[158]:
     A    B    C    D
1  5.1  3.5  1.4  0.2
1  4.9  3.0  1.4  0.2
2  4.7  3.2  1.3  0.2
2  4.6  3.1  1.5  0.2

Also having duplicate values in the index will make filtering/indexing problematic 索引中也有重复值会使过滤/索引出现问题

So here a[:,1:] I take all the rows but index from column 1 onwards as desired, see the docs 所以这里a[:,1:]我会根据需要从第1列开始获取所有行但请参阅文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM