繁体   English   中英

Pandas boolean 值 pivot 上多索引 dataframe

[英]Pandas boolean value pivot on multiindex dataframe

嗨,我在基于值列旋转表时遇到问题。

假设我们有一个多索引 dataframe grade

索引为CountryDateGroup和列Status

                        Status
Country Date    Group   
US  2019-12-31  Group A Absent
                Group B Not Pass
                Group C Absent
    2020-01-02  Group A Pass
                Group B Pass
                Group C Pass
...     ...     ...     ...
ID  2020-04-14  Group A Pass
                Group B Pass
                Group C Pass
    2020-04-15  Group A Pass
                Group B Pass
                Group C Pass

我想解开列groupStatus ,并根据Status列制作一个清单。

所以最后,我们得到了一个新的 dataframe checklist_gradeAbsentNot PassPass为每个group和相应的status值列中的值v

为了便于理解我们想要的插图:

                    Status                              
                    Group A                     Group B                     Group C     
Country Date        Absent  Not Pass    Pass    Absent  Not Pass    Pass    Absent  Not Pass    Pass
US      2019-12-31  v                                               v                           v       
        2020-01-02                      v                           v                           v
...     ...         ...     ...         ...     ...     ...         ...     ...     ...         ...
ID      2020-04-14              v                                   v                           v
        2020-04-15              v                                   v                           v

我正在尝试取消堆叠grade dataframe 但它只会分解到group

                    Status
                    Group A     Group B     Group C
Country Date            
US      2019-12-31  Absent      Not Pass    Absent
        2020-01-02  Pass        Pass        Pass
...     ...         ...         ...         ...
ID      2020-04-14  Pass        Pass        Pass
        2020-04-15  Pass        Pass        Pass

创建新列,将Status转换为MultiIndex并通过DataFrame.unstack重塑:

df = (df.assign(New='v')
       .set_index('Status', append=True)
       .unstack([2,3])
       .rename(columns={'New':'Status'}))
print (df)
                    Status                                         
Group              Group A  Group B Group C Group A Group B Group C
Status              Absent Not Pass  Absent    Pass    Pass    Pass
Country Date                                                       
ID      2020-04-14     NaN      NaN     NaN       v       v       v
        2020-04-15     NaN      NaN     NaN       v       v       v
US      2019-12-31       v        v       v     NaN     NaN     NaN
        2020-01-02     NaN      NaN     NaN       v       v       v

最后,如果需要, MultiIndex中的所有组合级别添加DataFrame.reindexMultiIndex.from_product

df = df.reindex(pd.MultiIndex.from_product(df.columns.levels), axis=1)
print (df)
                    Status                                              \
                   Group A               Group B               Group C   
                    Absent Not Pass Pass  Absent Not Pass Pass  Absent   
Country Date                                                             
ID      2020-04-14     NaN      NaN    v     NaN      NaN    v     NaN   
        2020-04-15     NaN      NaN    v     NaN      NaN    v     NaN   
US      2019-12-31       v      NaN  NaN     NaN        v  NaN       v   
        2020-01-02     NaN      NaN    v     NaN      NaN    v     NaN   



                   Not Pass Pass  
Country Date                      
ID      2020-04-14      NaN    v  
        2020-04-15      NaN    v  
US      2019-12-31      NaN  NaN  
        2020-01-02      NaN    v  

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM