[英]Pandas multilevel indexing for time series data with different reference and publish dates
我的會計數據同時具有參考日期(即會計季度結束日期)和發布日期(即實際收入發布的時間)。 下面是一個示例:
item Reference Value VALUED FQTR FYEARQ
Published
1986-12-14 CAPXY 1983-12-31 13.820 3 1 1984
1986-12-14 CAPXY 1984-03-31 20.895 3 2 1984
1986-12-14 CAPXY 1984-06-30 26.764 3 3 1984
1986-12-14 CAPXY 1984-09-30 39.614 3 4 1984
1986-12-14 CAPXY 1984-12-31 15.056 3 1 1985
1986-12-14 CAPXY 1985-03-31 33.604 3 2 1985
1986-12-14 CAPXY 1985-06-30 42.719 3 3 1985
1986-12-14 CAPXY 1985-09-30 54.064 3 4 1985
1986-12-14 CAPXY 1985-12-31 6.510 3 1 1986
1986-12-14 CAPXY 1986-03-31 18.503 3 2 1986
1986-12-14 CAPXY 1986-06-30 48.071 3 3 1986
1987-01-31 CAPXY 1986-09-30 66.629 2 4 1986
1987-01-31 CAPXY 1986-09-30 66.629 3 4 1986
1987-03-31 CAPXY 1986-12-31 15.740 2 1 1987
1987-03-31 CAPXY 1986-12-31 15.740 3 1 1987
1987-05-31 CAPXY 1987-03-31 38.699 2 2 1987
1987-05-31 CAPXY 1987-03-31 38.699 3 2 1987
1987-08-31 CAPXY 1987-06-30 61.006 2 3 1987
1987-08-31 CAPXY 1987-06-30 61.006 3 3 1987
1987-12-31 CAPXY 1987-09-30 86.127 2 4 1987
1987-12-31 CAPXY 1987-09-30 86.127 3 4 1987
1988-03-31 CAPXY 1987-12-31 34.140 2 1 1988
1988-03-31 CAPXY 1987-12-31 34.140 3 1 1988
1988-06-09 CAPXY 1988-03-31 68.059 2 2 1988
1988-06-09 CAPXY 1988-03-31 68.059 3 2 1988
1988-09-08 CAPXY 1988-06-30 101.198 2 3 1988
1988-09-08 CAPXY 1988-06-30 101.198 3 3 1988
1988-12-30 CAPXY 1988-09-30 144.001 2 4 1988
1988-12-30 CAPXY 1988-09-30 144.001 3 4 1988
1989-03-09 CAPXY 1988-12-31 73.967 2 1 1989
... ... ... ... ... ... ...
2001-08-16 OANCFY 2001-06-30 -90.000 2 3 2001
2001-08-16 OANCFY 2001-06-30 -90.000 3 3 2001
2002-01-10 OANCFY 2001-09-30 185.000 2 4 2001
2002-01-10 OANCFY 2001-09-30 185.000 3 4 2001
2002-02-14 OANCFY 2001-12-31 42.000 2 1 2002
2002-02-14 OANCFY 2001-12-31 42.000 3 1 2002
2002-05-23 OANCFY 2002-03-31 44.000 2 2 2002
2002-05-23 OANCFY 2002-03-31 44.000 3 2 2002
2002-08-15 OANCFY 2002-06-30 7.000 2 3 2002
2002-08-15 OANCFY 2002-06-30 7.000 3 3 2002
2002-12-31 OANCFY 2002-09-30 89.000 2 4 2002
2002-12-31 OANCFY 2002-09-30 89.000 3 4 2002
2003-02-13 OANCFY 2002-12-31 110.000 2 1 2003
2003-02-13 OANCFY 2002-12-31 110.000 3 1 2003
2003-05-22 OANCFY 2003-03-31 208.000 2 2 2003
2003-05-22 OANCFY 2003-03-31 208.000 3 2 2003
2003-08-21 OANCFY 2003-06-30 216.000 3 3 2003
2003-08-21 OANCFY 2003-06-30 216.000 2 3 2003
2003-12-31 OANCFY 2003-09-30 289.000 2 4 2003
2003-12-31 OANCFY 2003-09-30 289.000 3 4 2003
2004-02-19 OANCFY 2003-12-31 219.000 2 1 2004
2004-02-19 OANCFY 2003-12-31 219.000 3 1 2004
2004-05-20 OANCFY 2004-03-31 280.000 2 2 2004
2004-05-20 OANCFY 2004-03-31 280.000 3 2 2004
2004-08-19 OANCFY 2004-06-30 491.000 2 3 2004
2004-08-19 OANCFY 2004-06-30 491.000 3 3 2004
2004-12-16 OANCFY 2004-09-30 934.000 2 4 2004
2004-12-16 OANCFY 2004-09-30 934.000 3 4 2004
2005-02-10 OANCFY 2004-12-31 775.000 2 1 2005
2005-02-10 OANCFY 2004-12-31 775.000 3 1 2005
[396 rows x 6 columns]
數據是通過pandas.io.sql.read_sql導入到數據幀的。索引的問題取決於用戶是在參考日期還是在發布日期之前請求數據的特定情況。 然后,我需要透視數據並將每個項目顯示為帶有多級索引的列,以作為參考/發布日期。.對於一個發布日期,可以有很多重復的參考日期。
我想到了類似的東西:
index = accdata
pd.MultiIndex.from_tuples([index.columns])
df = pd.DataFrame(accdata, index=index)
df.stack()
但是在創建多索引數據框時出現以下錯誤:
TypeError: 'NoneType' object is not iterable
我認為這是一個相當普遍的問題,其中參考日期和發布日期不一致,但是我似乎找不到合適的解決方案。
有什么想法嗎?
根據亞歷山大的評論,在略微修改索引的情況下,我一直在尋找類似的東西:
df.reset_index().set_index(['Reference','Published'])
然后,也許(出於說明目的):
pd.concat(df[df['item'] == 'CAPXY']), df[df['item'] == 'OANCFY'])
但我收到以下錯誤:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
然后返回:
item Value VALUED FQTR FYEARQ
Reference Published
1983-12-31 1986-12-14 CAPXY 13.820 3 1 1984
1984-03-31 1986-12-14 CAPXY 20.895 3 2 1984
1984-06-30 1986-12-14 CAPXY 26.764 3 3 1984
1984-09-30 1986-12-14 CAPXY 39.614 3 4 1984
1984-12-31 1986-12-14 CAPXY 15.056 3 1 1985
但是我希望能夠實現以下目標:
First: CAPXY OANCFY
Second: Value VALUED FQTR FYEARQ Value VALUED FQTR FYEARQ
Reference Published
1983-12-31 1986-12-14 13.820 3 1 1984
1984-03-31 1986-12-14 20.895 3 2 1984
1984-06-30 1986-12-14 26.764 3 3 1984
1984-09-30 1986-12-14 39.614 3 4 1984
1984-12-31 1986-12-14 15.056 3 1 1985
以便將項目表示在列中,並且根據參考和發布日期將所有項目對齊(左連接)
根據您的DataFrame的打印方式,看起來當前已在“已發布”上建立索引。 您需要重置索引,然后將DataFrame重新索引為:a) item
b) Reference
c)已Published
>>> df.reset_index().set_index(['item', 'Reference', 'Published'])
index Value VALUED FQTR FYEARQ
item Reference Published
CAPXY 12/31/83 12/14/86 0 13.820 3 1 1984
3/31/84 12/14/86 1 20.895 3 2 1984
6/30/84 12/14/86 2 26.764 3 3 1984
9/30/84 12/14/86 3 39.614 3 4 1984
12/31/84 12/14/86 4 15.056 3 1 1985
3/31/85 12/14/86 5 33.604 3 2 1985
編輯:
根據修改后的帖子,我相信數據透視表可以解決問題。 我還交換列級別以獲得所需的格式。
請注意,如果日期是字符串,則需要將其轉換為日期時間對象(或時間戳)。
import datetime as dt
df['Reference'] = df.Reference.apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d').date())
df['Published'] = df.Published.apply(lambda x: dt.datetime.strptime(x, '%Y-%m-%d').date())
pt = df.reset_index().pivot_table(index=['Reference', 'FYEARQ', 'FQTR', 'Published'],
columns=['item'],
values=['Value', 'VALUED'])
pt.columns = pt.columns.swaplevel(0, 1)
>>> pt
item CAPXY
Value VALUED
Reference FYEARQ FQTR Published
1983-12-31 1984 1 1986-12-14 13.820 3.0
1984-03-31 1984 2 1986-12-14 20.895 3.0
1984-06-30 1984 3 1986-12-14 26.764 3.0
1984-09-30 1984 4 1986-12-14 39.614 3.0
1984-12-31 1985 1 1986-12-14 15.056 3.0
1985-03-31 1985 2 1986-12-14 33.604 3.0
1985-06-30 1985 3 1986-12-14 42.719 3.0
1985-09-30 1985 4 1986-12-14 54.064 3.0
1985-12-31 1986 1 1986-12-14 6.510 3.0
1986-03-31 1986 2 1986-12-14 18.503 3.0
1986-06-30 1986 3 1986-12-14 48.071 3.0
1986-09-30 1986 4 1987-01-31 66.629 2.5
1986-12-31 1987 1 1987-03-31 15.740 2.5
1987-03-31 1987 2 1987-05-31 38.699 2.5
1987-06-30 1987 3 1987-08-31 61.006 2.5
1987-09-30 1987 4 1987-12-31 86.127 2.5
1987-12-31 1988 1 1988-03-31 34.140 2.5
1988-03-31 1988 2 1988-06-09 68.059 2.5
1988-06-30 1988 3 1988-09-08 101.198 2.5
1988-09-30 1988 4 1988-12-30 144.001 2.5
1988-12-31 1989 1 1989-03-09 73.967 2.0
另外,您也可以嘗試groupby
因為每個索引的所有數據都是唯一的。
pt = df.reset_index().groupby(['Reference', 'FYEARQ', 'FQTR', 'item'])\
['Published', 'Value', 'VALUED'].first().unstack('item')
>>> pt
Published Value VALUED
item CAPXY OANCFY CAPXY OANCFY CAPXY OANCFY
Reference FYEARQ FQTR
1983-12-31 1984 1 1986-12-14 NaN 13.820 NaN 3 NaN
1984-03-31 1984 2 1986-12-14 NaN 20.895 NaN 3 NaN
1984-06-30 1984 3 1986-12-14 NaN 26.764 NaN 3 NaN
1984-09-30 1984 4 1986-12-14 NaN 39.614 NaN 3 NaN
1984-12-31 1985 1 1986-12-14 NaN 15.056 NaN 3 NaN
1985-03-31 1985 2 1986-12-14 NaN 33.604 NaN 3 NaN
1985-06-30 1985 3 1986-12-14 NaN 42.719 NaN 3 NaN
1985-09-30 1985 4 1986-12-14 NaN 54.064 NaN 3 NaN
1985-12-31 1986 1 1986-12-14 NaN 6.510 NaN 3 NaN
1986-03-31 1986 2 1986-12-14 NaN 18.503 NaN 3 NaN
1986-06-30 1986 3 1986-12-14 NaN 48.071 NaN 3 NaN
1986-09-30 1986 4 1987-01-31 NaN 66.629 NaN 2 NaN
1986-12-31 1987 1 1987-03-31 NaN 15.740 NaN 2 NaN
1987-03-31 1987 2 1987-05-31 NaN 38.699 NaN 2 NaN
1987-06-30 1987 3 1987-08-31 NaN 61.006 NaN 2 NaN
1987-09-30 1987 4 1987-12-31 NaN 86.127 NaN 2 NaN
1987-12-31 1988 1 1988-03-31 NaN 34.140 NaN 2 NaN
1988-03-31 1988 2 1988-06-09 NaN 68.059 NaN 2 NaN
1988-06-30 1988 3 1988-09-08 NaN 101.198 NaN 2 NaN
1988-09-30 1988 4 1988-12-30 NaN 144.001 NaN 2 NaN
1988-12-31 1989 1 1989-03-09 NaN 73.967 NaN 2 NaN
2001-06-30 2001 3 NaN 2001-08-16 NaN -90 NaN 3
2001-09-30 2001 4 NaN 2002-01-10 NaN 185 NaN 2
2001-12-31 2002 1 NaN 2002-02-14 NaN 42 NaN 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.