简体   繁体   中英

pandas groupby and convert rows to columns

I have sales dataset as below: df1

district      item       Year    salesAmount
Arba          pen        2019    10
Arba          pen        2019    20
Arba          pencil     2018    30
Arba          pencil     2018    30
Arba          pencil     2019    30
Cebu          pen        2019    100
Cebu          pen        2019    300
Cebu          pen        2018    100
Cebu          pen        2018    100
Cebu          pen        2019    100
Cebu          laptop     2019    20000
Cebu          laptop     2018    20000
Cebu          fruit      2019    200
Cebu          fruit      2018    800
Cebu          fruit      2019    800
Cebu          fruit      2018    100

I could complete groupby and get the aggregated sum for each prodcut for 2018 and 2019 respectively.

result = df1.groupby(['district', 'item', 'Year'], as_index=False)['salesAmount'].sum()

But I want to have to a tabular column as below inroder to do further transofrmation of columns

Expected output:

district      item       2018_sales    2019_sales
Arba          pen        0             30
Arba          pencil     60            30
Cebu          pen        500           200
Cebu          laptop     20000         20000
Cebu          fuit       900           1000

Thanks.

Given your previous calculation of result , just extend it with a pivot table and a few extra formatting steps. Note that the default aggfunc for pivot_table is 'mean' , but given that these values have already been aggregated in the previous step it won't make a difference if you prefer to use np.sum .

result = (
    result
    .pivot_table(index=['district', 'item'], columns=['Year'], values='salesAmount', fill_value=0)
    .add_suffix('_sales')
    .reset_index()
)
result.columns.name = None
>>> result
  district    item  2018_sales  2019_sales
0     Arba     pen         0.0        30.0
1     Arba  pencil        60.0        30.0
2     Cebu   fruit       900.0      1000.0
3     Cebu  laptop     20000.0     20000.0
4     Cebu     pen       200.0       500.0

Of course, you could also forget the initial groupby and just do the pivot_table directly:

result = (
    df1
    .pivot_table(
        index=['district', 'item'], 
        columns=['Year'], 
        values='salesAmount', 
        fill_value=0,
        aggfunc='sum'
    )
    .add_suffix('_sales')
    .reset_index()
)

You can use Pivote Table

df.pivot_table(values='salesAmount', index=['district', 'item'], columns=['Year'], aggfunc=np.sum)


Year                2018     2019
district item                    
Arba     pen         NaN     30.0
         pencil     60.0     30.0
Cebu     fruit     900.0   1000.0
         laptop  20000.0  20000.0
         pen       200.0    500.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM