简体   繁体   中英

Add observations to a pandas dataframe by expanding dates

I have a pandas Series like this:

sid     qtr
84024   1998-09-30
89565   2007-06-30
73083   1991-06-30
77447   2003-12-31
71079   1992-12-31

For each row, I'd like to create 3 additional rows, corresponding to the next three quarters for each sid .

I've come up with this approach, but I'm wondering if there's a more pandamic way to do this:

df = df.set_index('sid')
for k in xrange(4):
    df['q' + str(k+1)] = pd.DatetimeIndex(df['qtr']) + pd.offsets.QuarterEnd(k+1)
df = df.unstack()
df.index = df.index.get_level_values(1)
df = df.sort_values()

I convert the quarters (strings) to a quarterly Period object. I then use a list comprehension to generate each of the next three quarters. All of this is wrapped in a dictionary comprehension keyed on the sid and is used to generate a new dataframe named df2 .

df2 = pd.DataFrame({row.sid: [pd.Period(row.qtr, 'Q') + n for n in range(4)] 
                             for _, row in df.iterrows()}).T.reset_index()

>>> df2
   index       0       1       2       3
0  71079  1992Q4  1993Q1  1993Q2  1993Q3
1  73083  1991Q2  1991Q3  1991Q4  1992Q1
2  77447  2003Q4  2004Q1  2004Q2  2004Q3
3  84024  1998Q3  1998Q4  1999Q1  1999Q2
4  89565  2007Q2  2007Q3  2007Q4  2008Q1

Then I use melt to get all of the quarters in the same column.

df2 = pd.melt(df2, id_vars='index')
df2.rename(columns={'index': 'sid', 'value': 'qtr', 'variable': 'offset'}, inplace=True)

>>> df2
      sid offset     qtr
0   71079      0  1992Q4
1   73083      0  1991Q2
2   77447      0  2003Q4
3   84024      0  1998Q3
4   89565      0  2007Q2
5   71079      1  1993Q1
6   73083      1  1991Q3
7   77447      1  2004Q1
8   84024      1  1998Q4
9   89565      1  2007Q3
10  71079      2  1993Q2
11  73083      2  1991Q4
12  77447      2  2004Q2
13  84024      2  1999Q1
14  89565      2  2007Q4
15  71079      3  1993Q3
16  73083      3  1992Q1
17  77447      3  2004Q3
18  84024      3  1999Q2
19  89565      3  2008Q1

And here is a completely different approach.

Creating Period objects is expensive, so let's identify the unique quarters and then apply the period mapping.

quarters = df.qtr.unique().tolist()
mapping = {qtr: [pd.to_datetime(qtr) + pd.offsets.QuarterEnd(q) 
                 for q in range(4)] 
           for qtr in quarters}

>>> mapping
{'1991-06-30': [Timestamp('1991-06-30 00:00:00'),
  Timestamp('1991-09-30 00:00:00'),
  Timestamp('1991-12-31 00:00:00'),
  Timestamp('1992-03-31 00:00:00')],
 '1992-12-31': [Timestamp('1992-12-31 00:00:00'),
  Timestamp('1993-03-31 00:00:00'),
  Timestamp('1993-06-30 00:00:00'),
  Timestamp('1993-09-30 00:00:00')],
 '1998-09-30': [Timestamp('1998-09-30 00:00:00'),
  Timestamp('1998-12-31 00:00:00'),
  Timestamp('1999-03-31 00:00:00'),
  Timestamp('1999-06-30 00:00:00')],
 '2003-12-31': [Timestamp('2003-12-31 00:00:00'),
  Timestamp('2004-03-31 00:00:00'),
  Timestamp('2004-06-30 00:00:00'),
  Timestamp('2004-09-30 00:00:00')],
 '2007-06-30': [Timestamp('2007-06-30 00:00:00'),
  Timestamp('2007-09-30 00:00:00'),
  Timestamp('2007-12-31 00:00:00'),
  Timestamp('2008-03-31 00:00:00')]}

Now we can map the qtr column in the initial dataframe and use a list comprehension to extract each of the four quarters. These values are then zipped with the sid .

df2 = pd.DataFrame(zip(df.sid.tolist() * 4, 
                       [q[i] for i in range(4) 
                        for q in df.qtr.map(mapping).values.tolist()]), 
                   columns=['sid', 'qtr'])
df2 = df2.sort_values('sid').reset_index(drop=True)

>>> df2
      sid        qtr
0   71079 1993-03-31
1   71079 1993-06-30
2   71079 1992-12-31
3   71079 1993-09-30
4   73083 1991-06-30
5   73083 1991-09-30
6   73083 1991-12-31
7   73083 1992-03-31
8   77447 2004-03-31
9   77447 2004-09-30
10  77447 2004-06-30
11  77447 2003-12-31
12  84024 1998-12-31
13  84024 1999-03-31
14  84024 1999-06-30
15  84024 1998-09-30
16  89565 2007-09-30
17  89565 2007-12-31
18  89565 2007-06-30
19  89565 2008-03-31

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM