简体   繁体   English

具有以pandas数据集列作为行的数据透视表的有效方法是什么?

[英]What is the effective way to have a pivot-table having pandas dataset columns as its rows?

Let's take as an example the following dataset: 让我们以以下数据集为例:

    make    address all     3d   our    over    length_total    y
0   0.0     0.64    0.64    0.0  0.32   0.0     278             1
1   0.21    0.28    0.5     0.0  0.14   0.28    1028            1
2   0.06    0.0     0.71    0.0  1.23   0.19    2259            1
3   0.15    0.0     0.46    0.1  0.61   0.0     1257            1
4   0.06    0.12    0.77    0.0  0.19   0.32    749             1
5   0.0     0.0     0.0     0.0  0.0    0.0     21              1
6   0.0     0.0     0.25    0.0  0.38   0.25    184             1
7   0.0     0.69    0.34    0.0  0.34   0.0     261             1
8   0.0     0.0     0.0     0.0  0.9    0.0     25              1
9   0.0     0.0     1.42    0.0  0.71   0.35    205             1
10  0.0     0.0     0.0     0.0  0.0    0.0     23              0
11  0.48    0.0     0.0     0.0  0.48   0.0     37              0
12  0.12    0.0     0.25    0.0  0.0    0.0     491             0
13  0.08    0.08    0.25    0.2  0.0    0.25    807             0
14  0.0     0.0     0.0     0.0  0.0    0.0     38              0
15  0.24    0.0     0.12    0.0  0.0    0.12    227             0   
16  0.0     0.0     0.0     0.0  0.75   0.0     77              0
17  0.1     0.0     0.21    0.0  0.0    0.0     571             0
18  0.51    0.0     0.0     0.0  0.0    0.0     74              0
19  0.3     0.0     0.15    0.0  0.0    0.15    155             0

I want to get pivot-table from the previous dataset, in which the columns (make, address all, 3d, our, over, length_total) will have their mean values processed by the column y. 我想从以前的数据集中获取数据透视表,其中的列(make,address all,3d,我们的,over,length_total)将由y列处理其平均值。 The following table is the expected result: 下表是预期结果:

                    y   
                    1   0
make            0.048   0.183
address         0.173   0.008
all             0.509   0.098
3d              0.01    0.02
our             0.482   0.123
over            0.139   0.052
length_total    626.7   250

Is it possible to get the desired result through pivot_table method from pandas.data object? 是否可以通过pandas.data对象的pivot_table方法获得所需的结果? If so, how? 如果是这样,怎么办?

Is there a more effective way to do this? 有没有更有效的方法可以做到这一点?

Some people like using stack or unstack , but I prefer good ol' pd.melt to "flatten" or "unpivot" a frame: 有些人喜欢使用stackunstack ,但是我更喜欢pd.melt来“展平”或“ pd.melt ”框架:

>>> df_m = pd.melt(df, id_vars="y")
>>> df_m.pivot_table(index="variable", columns="y")
                value         
y                   0        1
variable                      
3d              0.020    0.010
address         0.008    0.173
all             0.098    0.509
length_total  250.000  626.700
make            0.183    0.048
our             0.123    0.482
over            0.052    0.139

(If you want to preserve the original column order as the new row order, you can use .loc to index into this, something like df2.loc[df.columns].dropna() ). (如果要将原始列顺序保留为新的行顺序,则可以使用.loc对此进行索引,例如df2.loc[df.columns].dropna() )。


Melting does the flattening, and preserves y as a column, putting the old column names as a new column called "variable" (which can be changed if you like): 融化会进行展平,并将y保留为一列,将旧列名作为一个新列称为"variable" (可以根据需要更改):

>>> pd.melt(df, id_vars="y").head()
   y variable  value
0  1     make   0.00
1  1     make   0.21
2  1     make   0.06
3  1     make   0.15
4  1     make   0.06

After that we can call pivot_table as we would ordinarily. 之后,我们可以像通常那样调用pivot_table

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM