简体   繁体   English

Pandas pivot_table:无法实现正确的格式

[英]Pandas pivot_table : cannot achieve correct format

I have the following data as setup : 我有以下数据作为设置:

    import pandas as pd

    df = pd.DataFrame({
        'index_1' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
        'index_2' : ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
        'month' : ['jan', 'jan', 'feb', 'feb', 'jan', 'jan', 'feb', 'feb'],
        'value_1' : range(0, 8),
        'value_2' : range(8, 16)
    })
    print(df)
    #     index_1 index_2 month  value_1  value_2
    # 0       A       A   jan        0        8
    # 1       A       B   jan        1        9
    # 2       A       C   feb        2       10
    # 3       A       D   feb        3       11
    # 4       B       A   jan        4       12
    # 5       B       B   jan        5       13
    # 6       B       C   feb        6       14
    # 7       B       D   feb        7       15

My expected output would look like this... (I did it by hand) 我的预期输出看起来像这样......(我手工完成)

    print(expected_output)
    #                   jan             feb      
    # month             value_1 value_2 value_1 value_2
    # index_1 index_2                           
    # A       A         0.0     8.0     NaN     NaN
    #         B         1.0     9.0     NaN     NaN
    #         C         NaN     NaN     2.0     10.0
    #         D         NaN     NaN     3.0     11.0
    # B       A         4.0     12.0    NaN     NaN
    #         B         5.0     13.0    NaN     NaN
    #         C         NaN     NaN     6.0     14.0
    #         D         NaN     NaN     7.0     15.0

There must be something I cannot wrap my mind around. 必须有一些我无法理解的东西。 I achieved the following, which is the good data, in a wrong format. 我以错误的格式实现了以下,即良好的数据。

    df = pd.pivot_table(
            df,
            index=['index_1', 'index_2'],
            columns=['month'],
            #   The 2 following lines are implicit, and don't change the output.
            #   values=['value_1', 'value_2'],
            #   aggfunc='sum'
    )
    print(df)
    #                   value_1      value_2      
    # month             feb  jan     feb   jan
    # index_1 index_2                           
    # A       A         NaN  0.0     NaN   8.0
    #         B         NaN  1.0     NaN   9.0
    #         C         2.0  NaN    10.0   NaN
    #         D         3.0  NaN    11.0   NaN
    # B       A         NaN  4.0     NaN  12.0
    #         B         NaN  5.0     NaN  13.0
    #         C         6.0  NaN    14.0   NaN
    #         D         7.0  NaN    15.0   NaN

I also tried using some .groupby() , along with .transpose() , but I have a hard time correctly formatting this DataFrame. 我也尝试使用一些.groupby().transpose() ,但我很难正确格式化这个DataFrame。 I have already read the following documentation pivot_table , reshaping dataframe and this cannonical by PiRSquared . 我已经阅读了以下文档pivot_table重新整形数据帧 ,这是由PiRSquared提供的

Use ordered Categorical for correct sort of months with DataFrame.swaplevel and DataFrame.sort_index : 使用有序Categorical ,使用DataFrame.swaplevelDataFrame.sort_index正确排序月份:

months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aAug', 'sep', 'oct', 'nov', 'dec']
df['month'] = pd.Categorical(df['month'], categories=months, ordered=True)

df = pd.pivot_table(
            df,
            index=['index_1', 'index_2'],
            columns=['month'],
    ).swaplevel(0,1, axis=1).sort_index(axis=1)
print(df)
month               jan             feb        
                value_1 value_2 value_1 value_2
index_1 index_2                                
A       A           0.0     8.0     NaN     NaN
        B           1.0     9.0     NaN     NaN
        C           NaN     NaN     2.0    10.0
        D           NaN     NaN     3.0    11.0
B       A           4.0    12.0     NaN     NaN
        B           5.0    13.0     NaN     NaN
        C           NaN     NaN     6.0    14.0
        D           NaN     NaN     7.0    15.0

Or: 要么:

months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aAug', 'sep', 'oct', 'nov', 'dec']
df['month'] = pd.Categorical(df['month'], categories=months, ordered=True)

df = (df.set_index(['index_1','index_2','month'])
        .unstack()
        .swaplevel(0,1, axis=1)
        .sort_index(axis=1))
print (df)
month               jan             feb        
                value_1 value_2 value_1 value_2
index_1 index_2                                
A       A           0.0     8.0     NaN     NaN
        B           1.0     9.0     NaN     NaN
        C           NaN     NaN     2.0    10.0
        D           NaN     NaN     3.0    11.0
B       A           4.0    12.0     NaN     NaN
        B           5.0    13.0     NaN     NaN
        C           NaN     NaN     6.0    14.0
        D           NaN     NaN     7.0    15.0

how about stack and unstack : 如何stackunstack stack

df.set_index(['index_1','index_2','month']).stack().unstack([2,3])

month               jan             feb        
                value_1 value_2 value_1 value_2
index_1 index_2                                
A       A           0.0     8.0     NaN     NaN
        B           1.0     9.0     NaN     NaN
        C           NaN     NaN     2.0    10.0
        D           NaN     NaN     3.0    11.0
B       A           4.0    12.0     NaN     NaN
        B           5.0    13.0     NaN     NaN
        C           NaN     NaN     6.0    14.0
        D           NaN     NaN     7.0    15.0
import pandas as pd
df = pd.DataFrame({
'index_1' : ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'index_2' : ['A', 'B', 'C', 'D', 'A', 'B', 'C', 'D'],
'month' : ['jan', 'jan', 'feb', 'feb', 'jan', 'jan', 'feb', 'feb'],
'value_1' : range(0, 8),
'value_2' : range(8, 16)})

print(df)
df = df.set_index(['index_1','index_2','month']).unstack(level=-1)
print(df)

Instead of a pivot table this is something called a hierarchical index . 这不是一个数据透视表,而是一种称为分层索引的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM