简体   繁体   中英

Pandas groupby/pivot dataframe of customer orders

Right now I have a DataFrame that looks something like this:

SAMPLE INPUT

    name    email           product     qty_ordered
0   jane    jane@email.com  Red Shirt   2
1   john    john@email.com  Green Shirt 2
2   john    john@email.com  Red Shirt   1
3   jim     jim@email.com   Green Shirt 1
4   jim     jim@email.com   Blue Shirt  2
5   bill    bill@email.com  Green Shirt 1
6   jim     jim@email.com   Blue Shirt  1
7   jane    jane@email.com  Blue Shirt  2
8   john    john@email.com  Blue Shirt  1
9   jim     jim@email.com   Green Shirt 2

And I am trying to figure out how to get something that looks like this:

EXPECTED OUTPUT

name  email          products    qty_ordered
jane  jane@email.com Red Shirt   2
                     Blue Shirt  2
john  john@email.com Green Shirt 2
                     Blue Shirt  1
                     Red Shirt   1
etc...

The data here isn't really whats import just the overall index/column format.

I've tried

pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: x['product'])

which seems to get close:

name  email            
bill  bill@email.com  5    Green Shirt
jane  jane@email.com  0      Red Shirt
                      7     Blue Shirt
jim   jim@email.com   3    Green Shirt
                      4     Blue Shirt
                      6     Blue Shirt
                      9    Green Shirt
john  john@email.com  1    Green Shirt
                      2      Red Shirt
                      8     Blue Shirt

But then modifiying that slightly like this:

pd.DataFrame(orders).groupby(['name', 'email']).apply(lambda x: [['product','qty_ordered']])

I get this, which I don't understand.

    product qty_ordered
0   Red Shirt   2
1   Green Shirt 2
2   Red Shirt   1
3   Green Shirt 1
4   Blue Shirt  2
5   Green Shirt 1
6   Blue Shirt  1
7   Blue Shirt  2
8   Blue Shirt  1
9   Green Shirt 2

I have also tried different variations of df.melt(), df.pivot(), df.agg() run the gamut of all the usual suspects.

I think I'm missing some fundamental understanding of how groupby() actually works. Any insight is greatly appreciated.

Is this what you are looking for?

df = df.sort_values('name')
df.loc[df.duplicated('name'), ['name', 'email']]=''

O/P:

   name           email      product  qty_ordered
5  bill  bill@email.com  Green Shirt            1
0  jane  jane@email.com    Red Shirt            2
7                         Blue Shirt            2
3   jim  jim@email.com   Green Shirt            1
4                         Blue Shirt            2
6                         Blue Shirt            1
9                        Green Shirt            2
1  john  john@email.com  Green Shirt            2
2                          Red Shirt            1
8                         Blue Shirt            1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM