简体   繁体   中英

How to pivot a dataframe with pandas to display values with aggregation and without aggregation

I want to pivot my dataframe using pandas, my dataframe look like this

Dataframe

I want shop_id with maximum item_cnt_day with maximum sold item_id sorted by date_block_num in descending order.

I have tried this

pd.pivot_table(sales1,index=['date_block_num', 'shop_id'], values=["item_cnt_day","item_id"], \
               aggfunc={"item_id":lambda x: x.value_counts().idxmax(),'item_cnt_day':sum}).\
            sort_values(by=['date_block_num','item_cnt_day'], ascending=False).reset_index().head(10)

Result dataframe (Not allowed to embed images as per stackoverflow)

i want only one row per date_block with shop_id having maximum item_cnt_day with item_id sold maximum.

You can do that in two aggregation steps like:

# first group by all three attributes to get one line per
# this three columns
grouped=df.groupby(['date_block_no', 'shop_id', 'item_id'])

# and just aggregate the item_cnt_day you want to have listed
aggregated=grouped.aggregate({'item_cnt_day': 'sum'})

# make the index columns regular columns again and resort
# so the highest sales come first (btw. I think you could remove
# date_block_no form the sort if you like, but it doesn't hurt)
aggregated.reset_index(inplace=True)
aggregated.sort_values(['date_block_no', 'item_cnt_day'], ascending=False, inplace=True)

# now aggregate the intermediate result again, but this time
# only by date_block_no and only keep the first row per
# group, which is the one with the highest sales, because we
# sorted it this way above
aggregated.groupby(['date_block_no']).aggregate('first')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM