pandas 将第一个多索引转换为行索引，将第二个多索引转换为列索引

Question

所以我有以下代码将给定的 dataframe 组转换为列并计算每个组的大小。 这将创建一个 DataFrame，其中包含一列和两个值的 Multiindex。 之后我将结果转换为将第二个索引作为 columnindex 并将第一个索引作为 rowindex。

import pandas as pd
from random import choice

products=['car', 'pen', 'kitchen', 'bed']
df=pd.DataFrame([{'product': choice(products), 'Changing': choice([True, False])} for _ in range(10000)])

df_grp=df.groupby(['product', 'Changing']).size().to_frame()
print(df_grp)
print('-'*50)

d=[]
for i in df_grp.reset_index()['product'].unique():
    d.append({'product':i, 'not changing': df_grp.loc[(i,False)][0], 'changing': df_grp.loc[(i,True)][0]})
print(pd.DataFrame(d).set_index('product'))

这导致以下 output

                     0
product Changing      
bed     False     1253
        True      1269
car     False     1244
        True      1275
kitchen False     1236
        True      1271
pen     False     1206
        True      1246
--------------------------------------------------
         not changing  changing
product                        
bed              1253      1269
car              1244      1275
kitchen          1236      1271
pen              1206      1246

这正是我想要实现的。 但是，我的方法似乎有点复杂。 有没有更优雅的方法来做到这一点？ 也许与 da 内置 pandas function 或 lambda ZC1C425268E68385D1AB50A？ 我的方法也不能很好地概括，因为我总是假设第二个索引是布尔值。

Answer 1

通过unstack()尝试：

df_grp=df.groupby(['product', 'Changing']).size().unstack()

最后使用columns属性：

df_grp.columns=[f'{"not "*col}{df_grp.columns.name}' for col in df_grp][::-1]
#as suggested by @Cyttorak #rename column without hard coding
#OR
df_grp.columns=['not changing','changing'] #you have to manually write the names

df_grp 的df_grp ：

          not changing  changing
product         
bed         1210        1226
car         1220        1272
kitchen     1238        1277
pen         1267        1290

pandas 将第一个多索引转换为行索引，将第二个多索引转换为列索引

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-06-11 07:23:14

pandas 将第一个多索引转换为行索引，将第二个多索引转换为列索引

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-06-11 07:23:14

解决方案1
2 已采纳 2021-06-11 07:23:14