I want to pivot my dataframe using pandas, my dataframe look like this
I want shop_id
with maximum item_cnt_day
with maximum sold item_id
sorted by date_block_num
in descending order.
I have tried this
pd.pivot_table(sales1,index=['date_block_num', 'shop_id'], values=["item_cnt_day","item_id"], \
aggfunc={"item_id":lambda x: x.value_counts().idxmax(),'item_cnt_day':sum}).\
sort_values(by=['date_block_num','item_cnt_day'], ascending=False).reset_index().head(10)
Result dataframe (Not allowed to embed images as per stackoverflow)
i want only one row per date_block
with shop_id
having maximum item_cnt_day
with item_id
sold maximum.
You can do that in two aggregation steps like:
# first group by all three attributes to get one line per
# this three columns
grouped=df.groupby(['date_block_no', 'shop_id', 'item_id'])
# and just aggregate the item_cnt_day you want to have listed
aggregated=grouped.aggregate({'item_cnt_day': 'sum'})
# make the index columns regular columns again and resort
# so the highest sales come first (btw. I think you could remove
# date_block_no form the sort if you like, but it doesn't hurt)
aggregated.reset_index(inplace=True)
aggregated.sort_values(['date_block_no', 'item_cnt_day'], ascending=False, inplace=True)
# now aggregate the intermediate result again, but this time
# only by date_block_no and only keep the first row per
# group, which is the one with the highest sales, because we
# sorted it this way above
aggregated.groupby(['date_block_no']).aggregate('first')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.