I have a pandas Series like this:
sid qtr
84024 1998-09-30
89565 2007-06-30
73083 1991-06-30
77447 2003-12-31
71079 1992-12-31
For each row, I'd like to create 3 additional rows, corresponding to the next three quarters for each sid
.
I've come up with this approach, but I'm wondering if there's a more pandamic way to do this:
df = df.set_index('sid')
for k in xrange(4):
df['q' + str(k+1)] = pd.DatetimeIndex(df['qtr']) + pd.offsets.QuarterEnd(k+1)
df = df.unstack()
df.index = df.index.get_level_values(1)
df = df.sort_values()
I convert the quarters (strings) to a quarterly Period object. I then use a list comprehension to generate each of the next three quarters. All of this is wrapped in a dictionary comprehension keyed on the sid
and is used to generate a new dataframe named df2
.
df2 = pd.DataFrame({row.sid: [pd.Period(row.qtr, 'Q') + n for n in range(4)]
for _, row in df.iterrows()}).T.reset_index()
>>> df2
index 0 1 2 3
0 71079 1992Q4 1993Q1 1993Q2 1993Q3
1 73083 1991Q2 1991Q3 1991Q4 1992Q1
2 77447 2003Q4 2004Q1 2004Q2 2004Q3
3 84024 1998Q3 1998Q4 1999Q1 1999Q2
4 89565 2007Q2 2007Q3 2007Q4 2008Q1
Then I use melt
to get all of the quarters in the same column.
df2 = pd.melt(df2, id_vars='index')
df2.rename(columns={'index': 'sid', 'value': 'qtr', 'variable': 'offset'}, inplace=True)
>>> df2
sid offset qtr
0 71079 0 1992Q4
1 73083 0 1991Q2
2 77447 0 2003Q4
3 84024 0 1998Q3
4 89565 0 2007Q2
5 71079 1 1993Q1
6 73083 1 1991Q3
7 77447 1 2004Q1
8 84024 1 1998Q4
9 89565 1 2007Q3
10 71079 2 1993Q2
11 73083 2 1991Q4
12 77447 2 2004Q2
13 84024 2 1999Q1
14 89565 2 2007Q4
15 71079 3 1993Q3
16 73083 3 1992Q1
17 77447 3 2004Q3
18 84024 3 1999Q2
19 89565 3 2008Q1
And here is a completely different approach.
Creating Period objects is expensive, so let's identify the unique quarters and then apply the period mapping.
quarters = df.qtr.unique().tolist()
mapping = {qtr: [pd.to_datetime(qtr) + pd.offsets.QuarterEnd(q)
for q in range(4)]
for qtr in quarters}
>>> mapping
{'1991-06-30': [Timestamp('1991-06-30 00:00:00'),
Timestamp('1991-09-30 00:00:00'),
Timestamp('1991-12-31 00:00:00'),
Timestamp('1992-03-31 00:00:00')],
'1992-12-31': [Timestamp('1992-12-31 00:00:00'),
Timestamp('1993-03-31 00:00:00'),
Timestamp('1993-06-30 00:00:00'),
Timestamp('1993-09-30 00:00:00')],
'1998-09-30': [Timestamp('1998-09-30 00:00:00'),
Timestamp('1998-12-31 00:00:00'),
Timestamp('1999-03-31 00:00:00'),
Timestamp('1999-06-30 00:00:00')],
'2003-12-31': [Timestamp('2003-12-31 00:00:00'),
Timestamp('2004-03-31 00:00:00'),
Timestamp('2004-06-30 00:00:00'),
Timestamp('2004-09-30 00:00:00')],
'2007-06-30': [Timestamp('2007-06-30 00:00:00'),
Timestamp('2007-09-30 00:00:00'),
Timestamp('2007-12-31 00:00:00'),
Timestamp('2008-03-31 00:00:00')]}
Now we can map the qtr
column in the initial dataframe and use a list comprehension to extract each of the four quarters. These values are then zipped with the sid
.
df2 = pd.DataFrame(zip(df.sid.tolist() * 4,
[q[i] for i in range(4)
for q in df.qtr.map(mapping).values.tolist()]),
columns=['sid', 'qtr'])
df2 = df2.sort_values('sid').reset_index(drop=True)
>>> df2
sid qtr
0 71079 1993-03-31
1 71079 1993-06-30
2 71079 1992-12-31
3 71079 1993-09-30
4 73083 1991-06-30
5 73083 1991-09-30
6 73083 1991-12-31
7 73083 1992-03-31
8 77447 2004-03-31
9 77447 2004-09-30
10 77447 2004-06-30
11 77447 2003-12-31
12 84024 1998-12-31
13 84024 1999-03-31
14 84024 1999-06-30
15 84024 1998-09-30
16 89565 2007-09-30
17 89565 2007-12-31
18 89565 2007-06-30
19 89565 2008-03-31
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.