简体   繁体   English

如何从numpy数组创建数据框

[英]How to create dataframe from numpy array

I have the following numpy array: 我有以下numpy数组:

numpy_x.shape

(9982, 26)

numpy_x have 9982 records/observations and 26 columns index. numpy_x具有9982个记录/观测值和26列索引。 Is this right really? 真的对吗?

numpy_x[:]
array([[0.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
        1.20000000e+00, 6.90000000e-01, 1.17000000e+00],
       [1.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
        1.20000000e+00, 7.00000000e-01, 1.17000000e+00],
       [2.00000000e+00, 9.60000000e-01, 1.00000000e+00, ...,
        1.20000000e+00, 7.00000000e-01, 1.17000000e+00],
       ...,
       [9.97900000e+03, 6.10920994e-01, 7.58135980e-01, ...,
        1.08704204e+00, 7.88187535e-01, 1.23021669e+00],
       [9.98000000e+03, 6.10920994e-01, 7.58135980e-01, ...,
        1.08704204e+00, 7.88187535e-01, 1.23021669e+00],
       [9.98100000e+03, 6.10920994e-01, 7.58135980e-01, ...,
        1.08704204e+00, 7.88187535e-01, 1.23021669e+00]])

I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?), then I proceed to perform the following: 我想用numpy_x数据,索引和列生成数据框(索引和列真的相同吗?),然后继续执行以下操作:

import pandas as pd
pd.DataFrame(data=numpy_x[:], # I want pass the entire numpy array content
            index=numpy_x[1:26],
            columns=numpy_x[9982:26])

But I get the following error: 但是我收到以下错误:

/.conda/envs/x/lib/python3.6/site-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4606         raise ValueError("Empty data passed with indices specified.")
   4607     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4608         passed, implied))
   4609 
   4610 

ValueError: Shape of passed values is (26, 9982), indices imply (0, 25)

How to can I understand what parameters pass on index and columns attributes? 如何理解indexcolumns属性上传递的参数?

Use - 采用 -

numpy_x=np.random.random((100,10))
df=pd.DataFrame(numpy_x)

Output 产量

          0         1         2         3         4         5         6  \
0  0.204839  0.837503  0.696896  0.235414  0.594766  0.521302  0.841167
1  0.041490  0.679537  0.657314  0.656672  0.524983  0.936918  0.482802
2  0.318928  0.423196  0.218037  0.515017  0.107851  0.564404  0.218297
3  0.644913  0.433771  0.297033  0.011239  0.346021  0.353749  0.587631
4  0.127949  0.517230  0.969399  0.743442  0.268566  0.415327  0.567572

          7         8         9
0  0.882685  0.211414  0.659820
1  0.752496  0.047198  0.775250
2  0.521580  0.655942  0.178753
3  0.123761  0.483601  0.157191
4  0.849218  0.098588  0.754402

I want generate a dataframe with numpy_x data, index and columns (index and columns are the same really?) 我想用numpy_x数据,索引和列生成一个数据框(索引和列真的一样吗?)

Yes and no. 是的,没有。 Index is simply the axis labelling information in pandas . Index只是pandas的轴标记信息。 Depending upon the axis, Index can either mean row indexing or column indexing. 根据轴,索引可以表示行索引或列索引。

The axis labeling information in pandas objects serves many purposes: pandas对象中的轴标签信息有许多用途:

  • Identifies data (ie provides metadata) using known indicators, important for analysis, visualization, and interactive console display 使用已知的指标标识数据(即提供元数据),这对于分析,可视化和交互式控制台显示很重要
  • Enables automatic and explicit data alignment 实现自动和明确的数据对齐
  • Allows intuitive getting and setting of subsets of the data set 允许直观地获取和设置数据集的子集

It can also be a simple single integer index or it can also be Multi-Index 它也可以是简单的单整数索引,也可以是Multi-Index

Index and Columns Parameter IndexColumns参数

The columns parameter is simply the column labels that you want to provide to your dataset, in this case you want to pass 26 names for the 26 columns in your numpy array. columns参数只是您要提供给数据集的列标签,在这种情况下,您希望为numpy数组中的26列传递26个名称。 This will default to np.arange(n) if no column labels are provided 如果未提供列标签,则默认为np.arange(n)

The index parameter is simply the Index to use for the resulting frame. index参数只是用于结果帧的Index。 Will default to np.arange(n) if no indexing information part of input data and no index provided (which is what is the case in my example) 如果没有输入数据的索引信息部分并且没有提供索引,则默认为np.arange(n) (在我的示例中就是这种情况)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM