简体   繁体   中英

how to create a new dataframe from an existing dataframe

my data was in such format

T1    YEAR   JAN  FEB    MAR   APRL   DEC    G1
ABC   2015   0    18.6   0.9    6.9    3.0   DATA
ABC   2016   8.9   0      0      3.9    0    TECH
DEF   2020    0    9.0    0      8.06    6    TECH
GHI   2017    0    1.1   9.8      6.8     0    OPT
JKL   2018   7.1   2.1    0       0       8    DATA
JKL   2020    5     2     6        6      5     OTHER

WHAT I DID WAS :

df = df.groupby(['T1','YEAR','G1'])['JAN','FEB','MAR','APRL','DEC'].SUM()

I GOT THE OUTPUT AS :

                      JAN    FEB      MAR    APRL   DEC  
T1     G1    YEAR 
----------------------------------------------------------
ABC  DATA    2015      25.9   55.8     5.9     7.9   66
              2016      2      0.9      0      8.0   66
              2017      0      88       1.09    66    0
              2018      55      77      7.1      6.0  1.9
              2019      7.9     5.0      6.9     98    6.0
              2020       7     55.0       77      98   7.8
ABC   TECH    (2015-2020)....

NOW I NEED MY OUTPUT IN SUCH FORMAT :

T1    G1     VALUES      TIME
---------------------------------
ABC    DATA   25.9        2015-01-01 00:00:00
ABC    DATA    55.8        2015-02-01 00:00:00
ABC     DATA    5.9        2015-03-01 00:00:00
ABC      DATA   7.9         2015-04-01 00:00:00

WHAT I TRIED FROM MY END WAS :

 for i , j in df.iterrows():
    for n in range (0,276)          (here I want to know how can I put my whole iterating value under one
       value =df.iloc[n,:]             column name 'Value')
        print(value)    

And also how can I access the groupby values of T1,G1,YEAR I TRIED DOING THIS :

GRP = pd.DataFrame(df.groupby(['T1','G1','YEAR']))

here I was trying to make a new DataFrame having columns T1,G1,YEAR then I would have added that value column in the dataframe

Can anyone tell me how to solve such issue ???

You can melt the dataframe and then create a new datetime column. Then keep / reorder necessary columns and sort the values:

df = df.groupby(['T1','YEAR','G1'])[['JAN','FEB','MAR','APRL','DEC']].sum().reset_index().rename({'APRL' : 'APR'}, axis=1)
df = df.melt(id_vars=['T1', 'YEAR','G1'], var_name='TIME', value_name='VALUES')
df['TIME'] = pd.to_datetime(df['TIME'] + '-' + df['YEAR'].astype(str))
df = df[['T1', 'G1', 'VALUES', 'TIME']].sort_values(['T1', 'G1','TIME'])
df
Out[1]: 
     T1     G1  VALUES       TIME
0   ABC   DATA    0.00 2015-01-01
6   ABC   DATA   18.60 2015-02-01
12  ABC   DATA    0.90 2015-03-01
18  ABC   DATA    6.90 2015-04-01
24  ABC   DATA    3.00 2015-12-01
1   ABC   TECH    8.90 2016-01-01
7   ABC   TECH    0.00 2016-02-01
13  ABC   TECH    0.00 2016-03-01
19  ABC   TECH    3.90 2016-04-01
25  ABC   TECH    0.00 2016-12-01
2   DEF   TECH    0.00 2020-01-01
8   DEF   TECH    9.00 2020-02-01
14  DEF   TECH    0.00 2020-03-01
20  DEF   TECH    8.06 2020-04-01
26  DEF   TECH    6.00 2020-12-01
3   GHI    OPT    0.00 2017-01-01
9   GHI    OPT    1.10 2017-02-01
15  GHI    OPT    9.80 2017-03-01
21  GHI    OPT    6.80 2017-04-01
27  GHI    OPT    0.00 2017-12-01
4   JKL   DATA    7.10 2018-01-01
10  JKL   DATA    2.10 2018-02-01
16  JKL   DATA    0.00 2018-03-01
22  JKL   DATA    0.00 2018-04-01
28  JKL   DATA    8.00 2018-12-01
5   JKL  OTHER    5.00 2020-01-01
11  JKL  OTHER    2.00 2020-02-01
17  JKL  OTHER    6.00 2020-03-01
23  JKL  OTHER    6.00 2020-04-01
29  JKL  OTHER    5.00 2020-12-01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM