簡體   English   中英

如何通過改變日期對pandas DataFrame進行分組?

[英]How to group pandas DataFrame by varying dates?

我正在嘗試將每日數據匯總到財務季度數據中。 例如,我有一個財政季度結束日期表:

Company Period Quarter_End
M       2016Q1 05/02/2015
M       2016Q2 08/01/2015
M       2016Q3 10/31/2015
M       2016Q4 01/30/2016
WFM     2015Q2 04/12/2015
WFM     2015Q3 07/05/2015 
WFM     2015Q4 09/27/2015
WFM     2016Q1 01/17/2016

和每日數據表:

Company Date       Price
M       06/20/2015 1.05
M       06/22/2015 4.05
M       07/10/2015 3.45
M       07/29/2015 1.86
M       08/24/2015 1.58
M       09/02/2015 8.64
M       09/22/2015 2.56
M       10/20/2015 5.42
M       11/02/2015 1.58
M       11/24/2015 4.58
M       12/03/2015 6.48
M       12/05/2015 4.56
M       01/03/2016 7.14
M       01/30/2016 6.34
WFM     06/20/2015 1.05
WFM     06/22/2015 4.05
WFM     07/10/2015 3.45
WFM     07/29/2015 1.86
WFM     08/24/2015 1.58
WFM     09/02/2015 8.64
WFM     09/22/2015 2.56
WFM     10/20/2015 5.42
WFM     11/02/2015 1.58
WFM     11/24/2015 4.58
WFM     12/03/2015 6.48
WFM     12/05/2015 4.56
WFM     01/03/2016 7.14
WFM     01/17/2016 6.34

我想創建下表。

Company Period  Quarter_end Sum(Price)
M       2016Q2  8/1/2015    10.41
M       2016Q3  10/31/2015  18.2
M       2016Q4  1/30/2016   30.68
WFM     2015Q3  7/5/2015    5.1
WFM     2015Q4  9/27/2015   18.09
WFM     2016Q1  1/17/2016   36.1

但是,我不知道如何通過改變日期進行分組而不循環遍歷每條記錄。 任何幫助是極大的贊賞。

謝謝!

我想你可以使用merge_ordered

#first convert columns to datetime
df1.Quarter_End = pd.to_datetime(df1.Quarter_End)
df2.Date = pd.to_datetime(df2.Date)


df = pd.merge_ordered(df1, 
                      df2, 
                      left_on=['Company','Quarter_End'], 
                      right_on=['Company','Date'], 
                      how='outer')
print (df)
   Company  Period Quarter_End       Date  Price
0        M  2016Q1  2015-05-02        NaT    NaN
1        M     NaN         NaT 2015-06-20   1.05
2        M     NaN         NaT 2015-06-22   4.05
3        M     NaN         NaT 2015-07-10   3.45
4        M     NaN         NaT 2015-07-29   1.86
5        M  2016Q2  2015-08-01        NaT    NaN
6        M     NaN         NaT 2015-08-24   1.58
7        M     NaN         NaT 2015-09-02   8.64
8        M     NaN         NaT 2015-09-22   2.56
9        M     NaN         NaT 2015-10-20   5.42
10       M  2016Q3  2015-10-31        NaT    NaN
11       M     NaN         NaT 2015-11-02   1.58
12       M     NaN         NaT 2015-11-24   4.58
13       M     NaN         NaT 2015-12-03   6.48
14       M     NaN         NaT 2015-12-05   4.56
15       M     NaN         NaT 2016-01-03   7.14
16       M  2016Q4  2016-01-30 2016-01-30   6.34
17     WFM  2015Q2  2015-04-12        NaT    NaN
18     WFM     NaN         NaT 2015-06-20   1.05
19     WFM     NaN         NaT 2015-06-22   4.05
20     WFM  2015Q3  2015-07-05        NaT    NaN
21     WFM     NaN         NaT 2015-07-10   3.45
22     WFM     NaN         NaT 2015-07-29   1.86
23     WFM     NaN         NaT 2015-08-24   1.58
24     WFM     NaN         NaT 2015-09-02   8.64
25     WFM     NaN         NaT 2015-09-22   2.56
26     WFM  2015Q4  2015-09-27        NaT    NaN
27     WFM     NaN         NaT 2015-10-20   5.42
28     WFM     NaN         NaT 2015-11-02   1.58
29     WFM     NaN         NaT 2015-11-24   4.58
30     WFM     NaN         NaT 2015-12-03   6.48
31     WFM     NaN         NaT 2015-12-05   4.56
32     WFM     NaN         NaT 2016-01-03   7.14
33     WFM  2016Q1  2016-01-17 2016-01-17   6.34

然后回填NaNPeriodQuarter_End通過bfill和總sum 如果需要刪除所有NaN值,請添加Series.dropna和last reset_index

df.Period = df.Period.bfill()
df.Quarter_End = df.Quarter_End.bfill()

print (df.groupby(['Company','Period','Quarter_End'])['Price'].sum().dropna().reset_index())

  Company  Period Quarter_End  Price
0       M  2016Q2  2015-08-01  10.41
1       M  2016Q3  2015-10-31  18.20
2       M  2016Q4  2016-01-30  30.68
3     WFM  2015Q3  2015-07-05   5.10
4     WFM  2015Q4  2015-09-27  18.09
5     WFM  2016Q1  2016-01-17  36.10
  • set_index
  • pd.concat來對齊索引
  • groupby with agg

prd_df = period_df.set_index(['Company', 'Quarter_End'])

prc_df = price_df.set_index(['Company', 'Date'], drop=False)

df = pd.concat([prd_df, prc_df], axis=1)

df.groupby([df.index.get_level_values(0), df.Period.bfill()])  \
  .agg(dict(Date='last', Price='sum')).dropna()

在此輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM