从numpy数组创建Pandas数据帧，并使用数组的第一列作为索引

Question

I have a numpy array (a): 我有一个numpy数组（a）：

array([[ 1. ,  5.1,  3.5,  1.4,  0.2],
[ 1. ,  4.9,  3. ,  1.4,  0.2],
[ 2. ,  4.7,  3.2,  1.3,  0.2],
[ 2. ,  4.6,  3.1,  1.5,  0.2]])

I would like to make a pandas dataframe (pd) with values=a, columns= A,B,C,D and index= to the first column of my numpy array, finally it should looks like this: 我想在我的numpy数组的第一列创建一个prandas dataframe（pd），其中values = a，columns = A，B，C，D和index =，最后它应该如下所示：

       A    B    C    D
  1  5.1  3.5  1.4  0.2
  1  4.9  3.0  1.4  0.2
  2  4.7  3.2  1.3  0.2
  2  4.6  3.1  1.5  0.2

I am trying this: 我在尝试这个：

    df = pd.DataFrame(a, index=a[:,0], columns=['A', 'B','C','D'])

and I get the following error: 我收到以下错误：

ValueError: Shape of passed values is (5, 4), indices imply (4, 4)

Any help? 有帮助吗？ Thanks 谢谢

Answer 1

You passed the complete array as the data param, you need to slice your array also if you want just 4 columns from the array as the data: 您将完整数组作为data参数传递，如果只需要数组中的4列作为data ，则还需要对数组进行切片：

In [158]:
df = pd.DataFrame(a[:,1:], index=a[:,0], columns=['A', 'B','C','D'])
df

Out[158]:
     A    B    C    D
1  5.1  3.5  1.4  0.2
1  4.9  3.0  1.4  0.2
2  4.7  3.2  1.3  0.2
2  4.6  3.1  1.5  0.2

Also having duplicate values in the index will make filtering/indexing problematic 索引中也有重复值会使过滤/索引出现问题

So here a[:,1:] I take all the rows but index from column 1 onwards as desired, see the docs 所以这里a[:,1:]我会根据需要从第1列开始获取所有行但请参阅文档

从numpy数组创建Pandas数据帧，并使用数组的第一列作为索引

问题描述

1 个解决方案

解决方案1
8 已采纳 2015-10-13 09:17:57

从numpy数组创建Pandas数据帧，并使用数组的第一列作为索引

问题描述

1 个解决方案

解决方案1 8 已采纳 2015-10-13 09:17:57

解决方案1
8 已采纳 2015-10-13 09:17:57