简体   繁体   中英

Transforming Database with Pandas Dataframe

I'm trying to work with some databases. This is a pic of the database

Code:

canastas= pd.concat([cba,cbt])
canastas.index = np.arange(12)
canastas = canastas.stack() 
Aiio   Mes     GSA  Pampeana  Noroeste  Noreste     Cuyo  Patagonia Canasta
2016    7  1666.48   1660.19   1458.24  1496.21  1494.04    1713.67     CBA
2016    8  1675.05   1662.99   1459.38  1501.91  1496.09    1723.86     CBA
2016    9  1711.22   1705.31   1498.62  1540.68  1536.29    1767.89     CBA
2016   10  1739.34   1731.84   1516.82  1559.51  1559.00    1797.44     CBA
2016   11  1762.65   1753.27   1532.67  1573.18  1577.67    1819.64     CBA
2016   12  1766.62   1754.08   1526.86  1571.59  1574.22    1822.96     CBA
2016    7  4032.88   4017.66   3281.04  3396.40  3854.62    4712.59     CBT
2016    8  4036.87   4007.81   3269.01  3394.32  3844.95    4723.38     CBT
2016    9  4089.82   4075.69   3326.94  3451.12  3917.54    4790.98     CBT
2016   10  4191.81   4173.73   3397.68  3524.49  4006.63    4924.99     CBT
2016   11  4247.99   4225.38   3417.85  3539.66  4038.84    4967.62     CBT
2016   12  4257.55   4227.33   3420.17  3551.79  4045.75    4994.91     CBT

And I need to get a database like this在此处输入图片说明

I was using .stack and .pivot_table functions but it doesn´t work. What pandas function or what do you recommend?

Use DataFrame.melt with some processing for Trimestre with integers division by 3 :

df = df.melt(['Aiio','Mes','Canasta'], var_name='Region', value_name='Valor')
df['Trimestre'] = (df['Mes'] - 1) // 3 + 1
df['Periodo'] = df['Aiio'] + df['Trimestre'] / 10

print (df)
    Aiio  Mes Canasta     Region    Valor  Trimestre  Periodo
0   2016    7     CBA        GSA  1666.48          3   2016.3
1   2016    8     CBA        GSA  1675.05          3   2016.3
2   2016    9     CBA        GSA  1711.22          3   2016.3
3   2016   10     CBA        GSA  1739.34          4   2016.4
4   2016   11     CBA        GSA  1762.65          4   2016.4
..   ...  ...     ...        ...      ...        ...      ...
67  2016    8     CBT  Patagonia  4723.38          3   2016.3
68  2016    9     CBT  Patagonia  4790.98          3   2016.3
69  2016   10     CBT  Patagonia  4924.99          4   2016.4
70  2016   11     CBT  Patagonia  4967.62          4   2016.4
71  2016   12     CBT  Patagonia  4994.91          4   2016.4

[72 rows x 7 columns]

Here is another solution using stack, basically they performed similar, please pay attention to the order of output.


s = df.set_index(['Aiio','Mes','Canasta']).stack()
s.name = 'Valor'
df = s.reset_index().rename(columns={"level_3":"Region"})
df['Trimestre'] = df['Mes'].sub(1) // 3 + 1
df['Periodo'] = df['Aiio'] + df['Trimestre'] / 10
df



    Aiio    Mes Canasta Region  Valor   Trimestre   Periodo
0   2016    7   CBA GSA 1666.48 3   2016.3
1   2016    7   CBA Pampeana    1660.19 3   2016.3
2   2016    7   CBA Noroeste    1458.24 3   2016.3
3   2016    7   CBA Noreste 1496.21 3   2016.3
4   2016    7   CBA Cuyo    1494.04 3   2016.3
... ... ... ... ... ... ... ...
67  2016    12  CBT Pampeana    4227.33 4   2016.4
68  2016    12  CBT Noroeste    3420.17 4   2016.4
69  2016    12  CBT Noreste 3551.79 4   2016.4
70  2016    12  CBT Cuyo    4045.75 4   2016.4
71  2016    12  CBT Patagonia   4994.91 4   2016.4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM