簡體   English   中英

對具有不同參考日期和發布日期的時間序列數據進行熊貓多級索引

[英]Pandas multilevel indexing for time series data with different reference and publish dates

我的會計數據同時具有參考日期(即會計季度結束日期)和發布日期(即實際收入發布的時間)。 下面是一個示例:

              item  Reference    Value VALUED  FQTR  FYEARQ
Published                                                  
1986-12-14   CAPXY 1983-12-31   13.820      3     1    1984
1986-12-14   CAPXY 1984-03-31   20.895      3     2    1984
1986-12-14   CAPXY 1984-06-30   26.764      3     3    1984
1986-12-14   CAPXY 1984-09-30   39.614      3     4    1984
1986-12-14   CAPXY 1984-12-31   15.056      3     1    1985
1986-12-14   CAPXY 1985-03-31   33.604      3     2    1985
1986-12-14   CAPXY 1985-06-30   42.719      3     3    1985
1986-12-14   CAPXY 1985-09-30   54.064      3     4    1985
1986-12-14   CAPXY 1985-12-31    6.510      3     1    1986
1986-12-14   CAPXY 1986-03-31   18.503      3     2    1986
1986-12-14   CAPXY 1986-06-30   48.071      3     3    1986
1987-01-31   CAPXY 1986-09-30   66.629      2     4    1986
1987-01-31   CAPXY 1986-09-30   66.629      3     4    1986
1987-03-31   CAPXY 1986-12-31   15.740      2     1    1987
1987-03-31   CAPXY 1986-12-31   15.740      3     1    1987
1987-05-31   CAPXY 1987-03-31   38.699      2     2    1987
1987-05-31   CAPXY 1987-03-31   38.699      3     2    1987
1987-08-31   CAPXY 1987-06-30   61.006      2     3    1987
1987-08-31   CAPXY 1987-06-30   61.006      3     3    1987
1987-12-31   CAPXY 1987-09-30   86.127      2     4    1987
1987-12-31   CAPXY 1987-09-30   86.127      3     4    1987
1988-03-31   CAPXY 1987-12-31   34.140      2     1    1988
1988-03-31   CAPXY 1987-12-31   34.140      3     1    1988
1988-06-09   CAPXY 1988-03-31   68.059      2     2    1988
1988-06-09   CAPXY 1988-03-31   68.059      3     2    1988
1988-09-08   CAPXY 1988-06-30  101.198      2     3    1988
1988-09-08   CAPXY 1988-06-30  101.198      3     3    1988
1988-12-30   CAPXY 1988-09-30  144.001      2     4    1988
1988-12-30   CAPXY 1988-09-30  144.001      3     4    1988
1989-03-09   CAPXY 1988-12-31   73.967      2     1    1989
...            ...        ...      ...    ...   ...     ...
2001-08-16  OANCFY 2001-06-30  -90.000      2     3    2001
2001-08-16  OANCFY 2001-06-30  -90.000      3     3    2001
2002-01-10  OANCFY 2001-09-30  185.000      2     4    2001
2002-01-10  OANCFY 2001-09-30  185.000      3     4    2001
2002-02-14  OANCFY 2001-12-31   42.000      2     1    2002
2002-02-14  OANCFY 2001-12-31   42.000      3     1    2002
2002-05-23  OANCFY 2002-03-31   44.000      2     2    2002
2002-05-23  OANCFY 2002-03-31   44.000      3     2    2002
2002-08-15  OANCFY 2002-06-30    7.000      2     3    2002
2002-08-15  OANCFY 2002-06-30    7.000      3     3    2002
2002-12-31  OANCFY 2002-09-30   89.000      2     4    2002
2002-12-31  OANCFY 2002-09-30   89.000      3     4    2002
2003-02-13  OANCFY 2002-12-31  110.000      2     1    2003
2003-02-13  OANCFY 2002-12-31  110.000      3     1    2003
2003-05-22  OANCFY 2003-03-31  208.000      2     2    2003
2003-05-22  OANCFY 2003-03-31  208.000      3     2    2003
2003-08-21  OANCFY 2003-06-30  216.000      3     3    2003
2003-08-21  OANCFY 2003-06-30  216.000      2     3    2003
2003-12-31  OANCFY 2003-09-30  289.000      2     4    2003
2003-12-31  OANCFY 2003-09-30  289.000      3     4    2003
2004-02-19  OANCFY 2003-12-31  219.000      2     1    2004
2004-02-19  OANCFY 2003-12-31  219.000      3     1    2004
2004-05-20  OANCFY 2004-03-31  280.000      2     2    2004
2004-05-20  OANCFY 2004-03-31  280.000      3     2    2004
2004-08-19  OANCFY 2004-06-30  491.000      2     3    2004
2004-08-19  OANCFY 2004-06-30  491.000      3     3    2004
2004-12-16  OANCFY 2004-09-30  934.000      2     4    2004
2004-12-16  OANCFY 2004-09-30  934.000      3     4    2004
2005-02-10  OANCFY 2004-12-31  775.000      2     1    2005
2005-02-10  OANCFY 2004-12-31  775.000      3     1    2005

[396 rows x 6 columns]

數據是通過pandas.io.sql.read_sql導入到數據幀的。索引的問題取決於用戶是在參考日期還是在發布日期之前請求數據的特定情況。 然后,我需要透視數據並將每個項目顯示為帶有多級索引的列,以作為參考/發布日期。.對於一個發布日期,可以有很多重復的參考日期。

我想到了類似的東西:

index = accdata
pd.MultiIndex.from_tuples([index.columns])
df = pd.DataFrame(accdata, index=index)
df.stack()

但是在創建多索引數據框時出現以下錯誤:

TypeError: 'NoneType' object is not iterable

我認為這是一個相當普遍的問題,其中參考日期和發布日期不一致,但是我似乎找不到合適的解決方案。

有什么想法嗎?

根據亞歷山大的評論,在略微修改索引的情況下,我一直在尋找類似的東西:

df.reset_index().set_index(['Reference','Published'])

然后,也許(出於說明目的):

pd.concat(df[df['item'] == 'CAPXY']), df[df['item'] == 'OANCFY'])

但我收到以下錯誤:

TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"

然后返回:

                         item    Value VALUED  FQTR  FYEARQ
Reference  Published                                       
1983-12-31 1986-12-14   CAPXY   13.820      3     1    1984
1984-03-31 1986-12-14   CAPXY   20.895      3     2    1984
1984-06-30 1986-12-14   CAPXY   26.764      3     3    1984
1984-09-30 1986-12-14   CAPXY   39.614      3     4    1984
1984-12-31 1986-12-14   CAPXY   15.056      3     1    1985

但是我希望能夠實現以下目標:

First:                  CAPXY                        OANCFY
Second:                 Value VALUED  FQTR  FYEARQ   Value   VALUED   FQTR  FYEARQ
Reference  Published                                       
1983-12-31 1986-12-14   13.820      3     1    1984
1984-03-31 1986-12-14   20.895      3     2    1984
1984-06-30 1986-12-14   26.764      3     3    1984
1984-09-30 1986-12-14   39.614      3     4    1984
1984-12-31 1986-12-14   15.056      3     1    1985

以便將項目表示在列中,並且根據參考和發布日期將所有項目對齊(左連接)

根據您的DataFrame的打印方式,看起來當前已在“已發布”上建立索引。 您需要重置索引,然后將DataFrame重新索引為:a) item b) Reference c)已Published

>>> df.reset_index().set_index(['item', 'Reference', 'Published'])
                       index    Value  VALUED  FQTR  FYEARQ
item  Reference Published                                      
CAPXY 12/31/83  12/14/86       0   13.820       3     1    1984
      3/31/84   12/14/86       1   20.895       3     2    1984
      6/30/84   12/14/86       2   26.764       3     3    1984
      9/30/84   12/14/86       3   39.614       3     4    1984
      12/31/84  12/14/86       4   15.056       3     1    1985
      3/31/85   12/14/86       5   33.604       3     2    1985

編輯:

根據修改后的帖子,我相信數據透視表可以解決問題。 我還交換列級別以獲得所需的格式。

請注意,如果日期是字符串,則需要將其轉換為日期時間對象(或時間戳)。

import datetime as dt

df['Reference'] = df.Reference.apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d').date())    
df['Published'] = df.Published.apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d').date())


pt = df.reset_index().pivot_table(index=['Reference', 'FYEARQ', 'FQTR', 'Published'], 
                                  columns=['item'], 
                                  values=['Value', 'VALUED'])

pt.columns = pt.columns.swaplevel(0, 1)

>>> pt
item                                 CAPXY       
                                     Value VALUED
Reference  FYEARQ FQTR Published                 
1983-12-31 1984   1    1986-12-14   13.820    3.0
1984-03-31 1984   2    1986-12-14   20.895    3.0
1984-06-30 1984   3    1986-12-14   26.764    3.0
1984-09-30 1984   4    1986-12-14   39.614    3.0
1984-12-31 1985   1    1986-12-14   15.056    3.0
1985-03-31 1985   2    1986-12-14   33.604    3.0
1985-06-30 1985   3    1986-12-14   42.719    3.0
1985-09-30 1985   4    1986-12-14   54.064    3.0
1985-12-31 1986   1    1986-12-14    6.510    3.0
1986-03-31 1986   2    1986-12-14   18.503    3.0
1986-06-30 1986   3    1986-12-14   48.071    3.0
1986-09-30 1986   4    1987-01-31   66.629    2.5
1986-12-31 1987   1    1987-03-31   15.740    2.5
1987-03-31 1987   2    1987-05-31   38.699    2.5
1987-06-30 1987   3    1987-08-31   61.006    2.5
1987-09-30 1987   4    1987-12-31   86.127    2.5
1987-12-31 1988   1    1988-03-31   34.140    2.5
1988-03-31 1988   2    1988-06-09   68.059    2.5
1988-06-30 1988   3    1988-09-08  101.198    2.5
1988-09-30 1988   4    1988-12-30  144.001    2.5
1988-12-31 1989   1    1989-03-09   73.967    2.0

另外,您也可以嘗試groupby因為每個索引的所有數據都是唯一的。

pt = df.reset_index().groupby(['Reference', 'FYEARQ', 'FQTR', 'item'])\
         ['Published', 'Value', 'VALUED'].first().unstack('item')

>>> pt
                         Published                Value        VALUED       
item                         CAPXY      OANCFY    CAPXY OANCFY  CAPXY OANCFY
Reference  FYEARQ FQTR                                                      
1983-12-31 1984   1     1986-12-14         NaN   13.820    NaN      3    NaN
1984-03-31 1984   2     1986-12-14         NaN   20.895    NaN      3    NaN
1984-06-30 1984   3     1986-12-14         NaN   26.764    NaN      3    NaN
1984-09-30 1984   4     1986-12-14         NaN   39.614    NaN      3    NaN
1984-12-31 1985   1     1986-12-14         NaN   15.056    NaN      3    NaN
1985-03-31 1985   2     1986-12-14         NaN   33.604    NaN      3    NaN
1985-06-30 1985   3     1986-12-14         NaN   42.719    NaN      3    NaN
1985-09-30 1985   4     1986-12-14         NaN   54.064    NaN      3    NaN
1985-12-31 1986   1     1986-12-14         NaN    6.510    NaN      3    NaN
1986-03-31 1986   2     1986-12-14         NaN   18.503    NaN      3    NaN
1986-06-30 1986   3     1986-12-14         NaN   48.071    NaN      3    NaN
1986-09-30 1986   4     1987-01-31         NaN   66.629    NaN      2    NaN
1986-12-31 1987   1     1987-03-31         NaN   15.740    NaN      2    NaN
1987-03-31 1987   2     1987-05-31         NaN   38.699    NaN      2    NaN
1987-06-30 1987   3     1987-08-31         NaN   61.006    NaN      2    NaN
1987-09-30 1987   4     1987-12-31         NaN   86.127    NaN      2    NaN
1987-12-31 1988   1     1988-03-31         NaN   34.140    NaN      2    NaN
1988-03-31 1988   2     1988-06-09         NaN   68.059    NaN      2    NaN
1988-06-30 1988   3     1988-09-08         NaN  101.198    NaN      2    NaN
1988-09-30 1988   4     1988-12-30         NaN  144.001    NaN      2    NaN
1988-12-31 1989   1     1989-03-09         NaN   73.967    NaN      2    NaN
2001-06-30 2001   3            NaN  2001-08-16      NaN    -90    NaN      3
2001-09-30 2001   4            NaN  2002-01-10      NaN    185    NaN      2
2001-12-31 2002   1            NaN  2002-02-14      NaN     42    NaN      2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM