Pandas 使用多索引列取消堆叠

Question

I have a pandas dataframe, which can be created with:我有一个 pandas dataframe，可以通过以下方式创建：

pd.DataFrame([[1,'a','green'],[2,'b','blue'],[2,'b','green'],[1,'e','green'],[2,'b','blue']], columns  = ['sales','product','color'], index = ['01-01-2020','01-01-2020','01-02-2020','01-03-2020','01-04-2020'])

and looks like:看起来像：

I would like to unstack the dataframe with the 'color' feature and create a multiindex by product of [green,blue],[sales,product] with the already existing columns as the second level of the column multiindex.我想将具有“颜色”功能的 dataframe 取消堆叠，并通过 [green,blue],[sales,product] 的乘积创建一个多索引，并将现有列作为列多索引的第二级。 The index of the dataframe is a date. dataframe 的索引是日期。 The resultant dataframe that I would like can be created with the code:我想要的结果 dataframe 可以使用以下代码创建：

pd.DataFrame([[1,'a',2,'b'],[2,'b',np.nan,np.nan],[1,'e',np.nan,np.nan],[np.nan,np.nan,2,'b']],columns = pd.MultiIndex.from_product([['green','blue'],['sales','product']]), index = ['01-01-2020','01-02-2020','01-03-2020','01-04-2020'])

and looks like:看起来像：

Please note that the resultant dataframe will be shorter than the original due to the common date indices.请注意，由于通用日期索引，生成的 dataframe 将比原始文件短。

For the life of me, I have been unable to figure out how to pivot/unstack correctly to figure this out.在我的一生中，我一直无法弄清楚如何正确旋转/取消堆叠来解决这个问题。 I am trying to apply this to a very large dataframe, so performance will be key for me.我正在尝试将其应用于非常大的 dataframe，因此性能对我来说是关键。 Many thanks for any and all help!非常感谢您的帮助！

Answer 1

Try this:尝试这个：

df.set_index('color', append=True).unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)

Output: Output：

color         blue         green      
           product sales product sales
01-01-2020       b   2.0       a   1.0
01-02-2020     NaN   NaN       b   2.0
01-03-2020     NaN   NaN       e   1.0
01-04-2020       b   2.0     NaN   NaN

Details:细节：

Add 'color' to your existing index with append=True使用append=True “颜色”添加到现有索引
Unstack the inner most index level, 'color' to add it to columns取消堆叠最里面的索引级别，“颜色”以将其添加到列中
Swap the multiindex column header levels and sort交换多索引列 header 级别并排序

As, @QuangHoang states:正如@QuangHoang 所说：

df.set_index('color', append=True).stack().unstack([1,2])

Which is much faster,哪个更快，

4.13 ms ± 274 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)每个循环 4.13 毫秒 ± 274 微秒（平均值 ± 标准偏差。7 次运行，每次 100 次循环）

versus相对

2.78 ms ± 44.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)每个循环 2.78 毫秒 ± 44.7 微秒（平均值 ± 标准偏差。7 次运行，每次 100 次循环）

Pandas 使用多索引列取消堆叠

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-01-26 15:29:53

Pandas 使用多索引列取消堆叠

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-01-26 15:29:53

解决方案1
4 已采纳 2021-01-26 15:29:53