如何从 Pandas DF 创建（正确）NumPy 阵列

Question

I'm trying to create a NumPy array for the "label" column from a pandas data-frame.我正在尝试为 pandas 数据帧中的“标签”列创建 NumPy 数组。

My df:我的df：

      label                                             vector
0         0   1:0.044509422 2:-0.03092437 3:0.054365806 4:-...
1         0   1:-0.007471546 2:-0.062329583 3:0.012314787 4...
2         0   1:-0.009525825 2:0.0028720177 3:0.0029517233 ...
3         1   1:-0.0040618754 2:-0.03754585 3:0.008025528 4...
4         0   1:0.039150625 2:-0.08689039 3:0.09603256 4:0....
...     ...                                                ...
59996     1   1:0.01846487 2:-0.012882819 3:0.035375785 4:-...
59997     1   1:0.01435293 2:-0.00683616 3:0.009475072 4:-0...
59998     1   1:0.018322088 2:-0.017116712 3:0.013021051 4:...
59999     0   1:0.014471473 2:-0.023652712 3:0.031210974 4:...
60000     1   1:0.00888336 2:-0.006902163 3:0.022569133 4:0...

As you can see I'm having two col: label and vector.如您所见，我有两个 col：label 和向量。 For the col label I'm using this solution:对于 col label 我正在使用这个解决方案：

y = pd.DataFrame([df.label])

print(y.astype(float).to_numpy())

print(y)

As result I'm having this:结果我有这个：


   0     1     2     3     4     5     6     7     8     9     10    11    12    13    14    15     ... 59985 59986 59987 59988 59989 59990 59991 59992 59993 59994 59995 59996 59997 59998 59999 60000
label     0     0     0     1     0     0     0     0     0     0     0     1     0     1     0     1  ...     1     1     1     0     1     0     0     1     1     1     1     1     1     1     0     1

[1 rows x 60001 columns]

However, the expected output should be:但是，预期的 output 应该是：

     0         
0    0
1    0
2    0
3    1

... ...

[60001 rows x 1 columns]

Instead of an array with [1 rows x 60001 columns] I would like to have an array with [60001 rows x 1 columns]而不是一个具有[1 rows x 60001 columns]的数组，我想要一个具有[60001 rows x 1 columns]的数组

Thanks for your time谢谢你的时间

Answer 1

Instead of an array with [1 rows x 60001 columns] I would like to have an array with [60001 rows x 1 columns] : If I understand your issue correctly and you need to reshape your array use:而不是具有 [1 行 x 60001 列] 的数组，我想要一个具有 [60001 行 x 1 列] 的数组：如果我正确理解您的问题并且您需要重塑您的数组，请使用：

y = y.reshape(-1, 1)

This will convert your array into a shape that has one columns and will automatically fix the the number of rows for you (the dimension assigned with -1 is automatically calculated from the arrays size and other dimensions shape).这会将您的数组转换为具有一列的形状，并将自动为您固定行数（分配有 -1 的维度是根据 arrays 大小和其他维度形状自动计算的）。 So you can do either of these:因此，您可以执行以下任一操作：

Your proposed way + reshape:您建议的方式+重塑：

y = pd.DataFrame([df.label]).astype(float).to_numpy().reshape(-1, 1)

Or @cs95's suggested answer (which results in the same array):或@cs95 的建议答案（导致相同的数组）：

y = df[['label']].astype(float).to_numpy()

Answer 2

If you start with a dataframe如果您从 dataframe 开始

In [98]: df                                                                                            
Out[98]: 
   a  b   c   d
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

and select a column by name, you get a Series:和 select 按名称列，你会得到一个系列：

In [99]: df.a                            # df['a']                                                              
Out[99]: 
0    0
1    4
2    8
Name: a, dtype: int64
In [100]: type(_)                                                                                      
Out[100]: pandas.core.series.Series

the to_numpy of the series is a 1d array:该系列的to_numpy是一维数组：

In [101]: df.a.to_numpy()                                                                              
Out[101]: array([0, 4, 8])
In [102]: _.shape                                                                                      
Out[102]: (3,)

But you've taken the Series, and turned it back into a dataframe:但是你已经拿了这个系列，把它变成了 dataframe：

In [103]: y = pd.DataFrame([df.a])                                                                     
In [104]: y                                                                                            
Out[104]: 
   0  1  2
a  0  4  8

Was the your intention?是你的意图吗？ In any case, the extracted array is 2d:在任何情况下，提取的数组都是 2d：

In [105]: y.to_numpy()                                                                                 
Out[105]: array([[0, 4, 8]])
In [106]: _.shape                                                                                      
Out[106]: (1, 3)

We can reshape it, or take its 'transpose':我们可以重塑它，或者采用它的“转置”：

In [107]: __.T                # reshape(3,1)                                                                         
Out[107]: 
array([[0],
       [4],
       [8]])

If we omit the [] from the y expression, we get a different dataframe and the desired 'column' array:如果我们从y表达式中省略 []，我们会得到一个不同的 dataframe 和所需的“列”数组：

In [109]: pd.DataFrame(df.a)                                                                           
Out[109]: 
   a
0  0
1  4
2  8
In [110]: pd.DataFrame(df.a).to_numpy()                                                                
Out[110]: 
array([[0],
       [4],
       [8]])

another option is to select column with a list:另一种选择是 select 列与列表：

In [111]: df[['a']]                                                                                    
Out[111]: 
   a
0  0
1  4
2  8

A Series is the pandas version of a 1d numpy array. A Series是一维numpy阵列的pandas版本。 It has row indices, but no column ones.它有行索引，但没有列索引。 A DataFrame is 2d, with rows and columns. DataFrame是 2d，有行和列。

Keep in mind that a numpy array can have shapes (3,), (1,3) and (3,1), all with the same 3 elements.请记住， numpy阵列可以具有形状 (3,)、(1,3) 和 (3,1)，它们都具有相同的 3 个元素。

如何从 Pandas DF 创建（正确）NumPy 阵列

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-20 01:05:42

解决方案2
0 2020-04-20 03:24:35

如何从 Pandas DF 创建（正确）NumPy 阵列

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-20 01:05:42

解决方案2 0 2020-04-20 03:24:35

解决方案1
1 已采纳 2020-04-20 01:05:42

解决方案2
0 2020-04-20 03:24:35