简体   繁体   English

Pandas groupby/pivot dataframe 客户订单

[英]Pandas groupby/pivot dataframe of customer orders

Right now I have a DataFrame that looks something like this:现在我有一个看起来像这样的 DataFrame:

SAMPLE INPUT样本输入

    name    email           product     qty_ordered
0   jane    jane@email.com  Red Shirt   2
1   john    john@email.com  Green Shirt 2
2   john    john@email.com  Red Shirt   1
3   jim     jim@email.com   Green Shirt 1
4   jim     jim@email.com   Blue Shirt  2
5   bill    bill@email.com  Green Shirt 1
6   jim     jim@email.com   Blue Shirt  1
7   jane    jane@email.com  Blue Shirt  2
8   john    john@email.com  Blue Shirt  1
9   jim     jim@email.com   Green Shirt 2

And I am trying to figure out how to get something that looks like this:我试图弄清楚如何得到这样的东西:

EXPECTED OUTPUT预期 OUTPUT

name  email          products    qty_ordered
jane  jane@email.com Red Shirt   2
                     Blue Shirt  2
john  john@email.com Green Shirt 2
                     Blue Shirt  1
                     Red Shirt   1
etc...

The data here isn't really whats import just the overall index/column format.这里的数据并不是真正重要的,只是整体索引/列格式。

I've tried我试过了

pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: x['product'])

which seems to get close:这似乎接近:

name  email            
bill  bill@email.com  5    Green Shirt
jane  jane@email.com  0      Red Shirt
                      7     Blue Shirt
jim   jim@email.com   3    Green Shirt
                      4     Blue Shirt
                      6     Blue Shirt
                      9    Green Shirt
john  john@email.com  1    Green Shirt
                      2      Red Shirt
                      8     Blue Shirt

But then modifiying that slightly like this:但是然后稍微修改一下:

pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: [['product','qty_ordered']])

I get this, which I don't understand.我明白了,我不明白。

    product qty_ordered
0   Red Shirt   2
1   Green Shirt 2
2   Red Shirt   1
3   Green Shirt 1
4   Blue Shirt  2
5   Green Shirt 1
6   Blue Shirt  1
7   Blue Shirt  2
8   Blue Shirt  1
9   Green Shirt 2

I have also tried different variations of df.melt(), df.pivot(), df.agg() run the gamut of all the usual suspects.我还尝试了df.melt(), df.pivot(), df.agg()的不同变体来运行所有常见嫌疑人的范围。

I think I'm missing some fundamental understanding of how groupby() actually works.我想我对groupby()的实际工作原理缺乏一些基本的了解。 Any insight is greatly appreciated.非常感谢任何见解。

Is this what you are looking for?这是你想要的?

df = df.sort_values('name')
df.loc[df.duplicated('name'), ['name', 'email']]=''

O/P:输出/输出:

   name           email      product  qty_ordered
5  bill  bill@email.com  Green Shirt            1
0  jane  jane@email.com    Red Shirt            2
7                         Blue Shirt            2
3   jim  jim@email.com   Green Shirt            1
4                         Blue Shirt            2
6                         Blue Shirt            1
9                        Green Shirt            2
1  john  john@email.com  Green Shirt            2
2                          Red Shirt            1
8                         Blue Shirt            1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM