简体   繁体   中英

How do I manipulate a Dataframe with Pivot_Table in Python

I have spent much time on this but I am nowhere closer to a solution.

I have a dataframe which outputs as

   RegionID  AreaID    Year     Jan     Feb     Mar     Apr     May     Jun
0       20.0     1.0  2020.0  1174.0  1056.0  1051.0  1107.0  1097.0  1118.0   
1       19.0     2.0  2020.0   460.0   451.0   421.0   421.0   420.0   457.0   
2       20.0     3.0  2020.0  2723.0  2594.0  2590.0  2399.0  2377.0  2331.0   
3       21.0     4.0  2020.0   863.0   859.0   813.0   785.0   757.0   765.0   
4       19.0     5.0  2020.0  4037.0  3942.0  4069.0  3844.0  3567.0  3721.0   
5       19.0     6.0  2020.0  1695.0  1577.0  1531.0  1614.0  1671.0  1693.0   
6       18.0     7.0  2020.0  1757.0  1505.0  1445.0  1514.0  1406.0  1444.0   
7       18.0     8.0  2020.0   832.0   721.0   747.0   852.0   885.0   872.0   
8       18.0     9.0  2020.0  2538.0  2000.0  2026.0  1981.0  1987.0  1949.0   
9       21.0    10.0  2020.0  1145.0  1235.0  1114.0  1161.0  1150.0  1189.0   
10      20.0    11.0  2020.0   551.0   497.0   503.0   472.0   505.0   532.0   
11      19.0    12.0  2020.0  1664.0  1526.0  1389.0  1373.0  1384.0  1404.0   
12      21.0    13.0  2020.0   381.0   351.0   299.0   286.0   297.0   319.0   
13      21.0    14.0  2020.0  1733.0  1627.0  1567.0  1561.0  1498.0  1511.0   
14      18.0    15.0  2020.0  1257.0  1257.0  1160.0  1172.0  1124.0  1113.0 

I want to pivot this data so that I have a month combined field like below

RegionID      AreaID    Year    Month   Amout
20.0            1.0     2020    Jan     1174
20.0            1.0     2020    Feb     1056
20.0            1.0     2020    Mar     1051

Can this be done using pandas? I have been trying with the pivot_table but I cant get it to work.

I hope I've understood your question well. You can .set_index() and then .stack() :

print(
    df.set_index(["RegionID", "AreaID", "Year"])
    .stack()
    .reset_index()
    .rename(columns={"level_3": "Month", 0: "Amount"})
)

Prints:

    RegionID  AreaID    Year Month  Amount
0       20.0     1.0  2020.0   Jan  1174.0
1       20.0     1.0  2020.0   Feb  1056.0
2       20.0     1.0  2020.0   Mar  1051.0
3       20.0     1.0  2020.0   Apr  1107.0
4       20.0     1.0  2020.0   May  1097.0
5       20.0     1.0  2020.0   Jun  1118.0
6       19.0     2.0  2020.0   Jan   460.0
7       19.0     2.0  2020.0   Feb   451.0
8       19.0     2.0  2020.0   Mar   421.0
9       19.0     2.0  2020.0   Apr   421.0
10      19.0     2.0  2020.0   May   420.0
11      19.0     2.0  2020.0   Jun   457.0

...

Or:

print(
    df.melt(
        ["RegionID", "AreaID", "Year"], var_name="Month", value_name="Amount"
    )
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM