简体   繁体   English

Pandas Dataframe MultiIndex将多级索引中的一个转换为另一轴,同时将另一级保留在原始轴中

[英]Pandas Dataframe MultiIndex transform one level of the multiindex to another axis while keeping the other level in the original axis

I have a Pandas Dataframe with MultiIndex in the row indexers like this: 我在这样的行索引器中有一个带有MultiIndex的Pandas Dataframe

在此处输入图片说明

This dataframe is a result of a groupby operation and then slicing from a 3-level MultiIndex .I would like the 'date' row indexer to remain, but shift the 'SlabType' level of row indexers into column indexer with non-available values as NaN . 该数据帧是groupby操作的结果,然后从3级MultiIndex 。我希望保留“日期”行索引器,但将行索引器的“ SlabType”级别转移到具有不可用值的列索引器中NaN

This is what I would like to get to: 这就是我想要得到的:

在此处输入图片说明

What operations do I need to do to achieve this? 为此,我需要执行哪些操作? Also if the title of the question can be improved, please suggest so. 另外,如果可以改善问题的标题,请提出建议。

Use unstack with select column SlabLT : 使用unstack与选择列SlabLT

print (df['SlabLT'].unstack())

But if possible duplicates in MultiIndex is necessary aggregate column, ag by mean : 但是,如果可能的重复MultiIndex是必要的聚合列,由股份公司mean

print (df.groupby(level=[0,1])['SlabLT'].mean().unstack())

Sample : 样品

df = pd.DataFrame({'date':['2017-10-01','2017-10-08','2017-10-08','2017-10-15', '2017-10-15'],
                   'SlabType':['UOM2','AMOUNT','UOM2','AMOUNT','AMOUNT'],
                   'SlabLT':[1,6000,1,6000,5000]}).set_index(['date','SlabType'])

print (df)
                     SlabLT
date       SlabType        
2017-10-01 UOM2           1
2017-10-08 AMOUNT      6000
           UOM2           1
2017-10-15 AMOUNT      6000 <-duplicated MultiIndex '2017-10-15', 'AMOUNT'
           AMOUNT      5000 <-duplicated MultiIndex '2017-10-15', 'AMOUNT'

print (df['SlabLT'].unstack())

ValueError: Index contains duplicate entries, cannot reshape ValueError:索引包含重复的条目,无法重塑


print (df.groupby(level=[0,1])['SlabLT'].mean())
date        SlabType
2017-10-01  UOM2           1
2017-10-08  AMOUNT      6000
            UOM2           1
2017-10-15  AMOUNT      5500
Name: SlabLT, dtype: int64

print (df.groupby(level=[0,1])['SlabLT'].mean().unstack())
SlabType    AMOUNT  UOM2
date                    
2017-10-01     NaN   1.0
2017-10-08  6000.0   1.0
2017-10-15  5500.0   NaN

Since you have NaN values for some entries, you may want to consider pivot table to avoid "duplicate entries" ValueError when unstacking one of the indices. 由于某些条目具有NaN值,因此您可能需要考虑使用数据透视表来避免在堆积索引之一时出现“重复的条目” ValueError。

Suppose you have df DataFrame with column 'SlabLT' with indices date and SlabType , try: 假设您的df DataFrame的列为'SlabLT' ,索引为dateSlabType ,请尝试:

df.reset_index().pivot_table(values = 'SlabLT', index = 'date', columns = 'SlabLT')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM