[英]Pandas groupby/pivot dataframe of customer orders
Right now I have a DataFrame that looks something like this:现在我有一个看起来像这样的 DataFrame:
SAMPLE INPUT样本输入
name email product qty_ordered
0 jane jane@email.com Red Shirt 2
1 john john@email.com Green Shirt 2
2 john john@email.com Red Shirt 1
3 jim jim@email.com Green Shirt 1
4 jim jim@email.com Blue Shirt 2
5 bill bill@email.com Green Shirt 1
6 jim jim@email.com Blue Shirt 1
7 jane jane@email.com Blue Shirt 2
8 john john@email.com Blue Shirt 1
9 jim jim@email.com Green Shirt 2
And I am trying to figure out how to get something that looks like this:我试图弄清楚如何得到这样的东西:
EXPECTED OUTPUT预期 OUTPUT
name email products qty_ordered
jane jane@email.com Red Shirt 2
Blue Shirt 2
john john@email.com Green Shirt 2
Blue Shirt 1
Red Shirt 1
etc...
The data here isn't really whats import just the overall index/column format.这里的数据并不是真正重要的,只是整体索引/列格式。
I've tried我试过了
pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: x['product'])
which seems to get close:这似乎接近:
name email
bill bill@email.com 5 Green Shirt
jane jane@email.com 0 Red Shirt
7 Blue Shirt
jim jim@email.com 3 Green Shirt
4 Blue Shirt
6 Blue Shirt
9 Green Shirt
john john@email.com 1 Green Shirt
2 Red Shirt
8 Blue Shirt
But then modifiying that slightly like this:但是然后稍微修改一下:
pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: [['product','qty_ordered']])
I get this, which I don't understand.我明白了,我不明白。
product qty_ordered
0 Red Shirt 2
1 Green Shirt 2
2 Red Shirt 1
3 Green Shirt 1
4 Blue Shirt 2
5 Green Shirt 1
6 Blue Shirt 1
7 Blue Shirt 2
8 Blue Shirt 1
9 Green Shirt 2
I have also tried different variations of df.melt(), df.pivot(), df.agg()
run the gamut of all the usual suspects.我还尝试了df.melt(), df.pivot(), df.agg()
的不同变体来运行所有常见嫌疑人的范围。
I think I'm missing some fundamental understanding of how groupby()
actually works.我想我对groupby()
的实际工作原理缺乏一些基本的了解。 Any insight is greatly appreciated.非常感谢任何见解。
Is this what you are looking for?这是你想要的?
df = df.sort_values('name')
df.loc[df.duplicated('name'), ['name', 'email']]=''
O/P:输出/输出:
name email product qty_ordered
5 bill bill@email.com Green Shirt 1
0 jane jane@email.com Red Shirt 2
7 Blue Shirt 2
3 jim jim@email.com Green Shirt 1
4 Blue Shirt 2
6 Blue Shirt 1
9 Green Shirt 2
1 john john@email.com Green Shirt 2
2 Red Shirt 1
8 Blue Shirt 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.