簡體   English   中英

如何從現有數據幀創建新數據幀

[英]how to create a new dataframe from an existing dataframe

我的數據是這樣的格式

T1    YEAR   JAN  FEB    MAR   APRL   DEC    G1
ABC   2015   0    18.6   0.9    6.9    3.0   DATA
ABC   2016   8.9   0      0      3.9    0    TECH
DEF   2020    0    9.0    0      8.06    6    TECH
GHI   2017    0    1.1   9.8      6.8     0    OPT
JKL   2018   7.1   2.1    0       0       8    DATA
JKL   2020    5     2     6        6      5     OTHER

我所做的是:

df = df.groupby(['T1','YEAR','G1'])['JAN','FEB','MAR','APRL','DEC'].SUM()

我得到的輸出為:

                      JAN    FEB      MAR    APRL   DEC  
T1     G1    YEAR 
----------------------------------------------------------
ABC  DATA    2015      25.9   55.8     5.9     7.9   66
              2016      2      0.9      0      8.0   66
              2017      0      88       1.09    66    0
              2018      55      77      7.1      6.0  1.9
              2019      7.9     5.0      6.9     98    6.0
              2020       7     55.0       77      98   7.8
ABC   TECH    (2015-2020)....

現在我需要這種格式的輸出:

T1    G1     VALUES      TIME
---------------------------------
ABC    DATA   25.9        2015-01-01 00:00:00
ABC    DATA    55.8        2015-02-01 00:00:00
ABC     DATA    5.9        2015-03-01 00:00:00
ABC      DATA   7.9         2015-04-01 00:00:00

我最終嘗試的是:

 for i , j in df.iterrows():
    for n in range (0,276)          (here I want to know how can I put my whole iterating value under one
       value =df.iloc[n,:]             column name 'Value')
        print(value)    

以及如何訪問 T1,G1,YEAR I TRIED DOING THIS 的 groupby 值:

GRP = pd.DataFrame(df.groupby(['T1','G1','YEAR']))

在這里,我試圖創建一個包含 T1、G1、YEAR 列的新數據幀,然后我會在數據幀中添加該值列

誰能告訴我如何解決這樣的問題???

您可以melt數據框,然后創建一個新的日期時間列。 然后保留/重新排序必要的列並對值進行排序:

df = df.groupby(['T1','YEAR','G1'])[['JAN','FEB','MAR','APRL','DEC']].sum().reset_index().rename({'APRL' : 'APR'}, axis=1)
df = df.melt(id_vars=['T1', 'YEAR','G1'], var_name='TIME', value_name='VALUES')
df['TIME'] = pd.to_datetime(df['TIME'] + '-' + df['YEAR'].astype(str))
df = df[['T1', 'G1', 'VALUES', 'TIME']].sort_values(['T1', 'G1','TIME'])
df
Out[1]: 
     T1     G1  VALUES       TIME
0   ABC   DATA    0.00 2015-01-01
6   ABC   DATA   18.60 2015-02-01
12  ABC   DATA    0.90 2015-03-01
18  ABC   DATA    6.90 2015-04-01
24  ABC   DATA    3.00 2015-12-01
1   ABC   TECH    8.90 2016-01-01
7   ABC   TECH    0.00 2016-02-01
13  ABC   TECH    0.00 2016-03-01
19  ABC   TECH    3.90 2016-04-01
25  ABC   TECH    0.00 2016-12-01
2   DEF   TECH    0.00 2020-01-01
8   DEF   TECH    9.00 2020-02-01
14  DEF   TECH    0.00 2020-03-01
20  DEF   TECH    8.06 2020-04-01
26  DEF   TECH    6.00 2020-12-01
3   GHI    OPT    0.00 2017-01-01
9   GHI    OPT    1.10 2017-02-01
15  GHI    OPT    9.80 2017-03-01
21  GHI    OPT    6.80 2017-04-01
27  GHI    OPT    0.00 2017-12-01
4   JKL   DATA    7.10 2018-01-01
10  JKL   DATA    2.10 2018-02-01
16  JKL   DATA    0.00 2018-03-01
22  JKL   DATA    0.00 2018-04-01
28  JKL   DATA    8.00 2018-12-01
5   JKL  OTHER    5.00 2020-01-01
11  JKL  OTHER    2.00 2020-02-01
17  JKL  OTHER    6.00 2020-03-01
23  JKL  OTHER    6.00 2020-04-01
29  JKL  OTHER    5.00 2020-12-01

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM