UPDATE!
Just be aware of changing the year to int in both the df and the pivot table (after unstacking). That caused me some trouble :)
Data for the values:
d = {'ID':[1,1,1,2,2,2],'Date':['01-01-2013','01-02-2013','01-03-2013','01-
01-2008','01-02-2008','01-03-2008'],'CUSIP':
['X1','X1','X1','X2','X2','X2'],'X':['bla','bla','bla','bla','bla','bla']}
df = pd.DataFrame(data=d)
I have a dataframe:
Identifier CUSIP X Date
0 1 X1 bla 2013-01-01
1 1 X1 bla 2013-01-02
2 1 X1 bla 2013-01-03
3 2 X2 bla 2008-01-01
4 2 X2 bla 2008-01-02
5 2 X2 bla 2008-01-03
And a pivot table:
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
And I would like to achieve a layout like:
Identifier CUSIP X Date Values
0 1 X1 bla 2013-01-01 1
1 1 X1 bla 2013-01-02 1
2 1 X1 bla 2013-01-03 1
3 2 X2 bla 2008-01-01 2
4 2 X2 bla 2008-01-02 2
5 2 X2 bla 2008-01-03 2
You can use stack
for reshape df2
with join
with left join:
#if necessary
df['Date'] = pd.to_datetime(df['Date'])
df['year'] = df.Date.dt.year
df1 = df.join(df1.stack().rename('val'), on=['CUSIP', 'year'])
print (df1)
Identifier Date CUSIP X year val
0 1 2013-01-01 X1 bla 2013 1
1 1 2013-01-02 X1 bla 2013 1
2 1 2013-04-03 X1 bla 2013 1
3 2 2008-01-01 X2 bla 2008 2
4 2 2008-01-02 X2 bla 2008 2
5 2 2008-03-03 X2 bla 2008 2
Alternative solution:
df1 = df.join(df1.stack().rename('val'), on=[df['CUSIP'], df['Date'].dt.year])
print (df1)
Identifier Date CUSIP X val
0 1 2013-01-01 X1 bla 1
1 1 2013-01-02 X1 bla 1
2 1 2013-04-03 X1 bla 1
3 2 2008-01-01 X2 bla 2
4 2 2008-01-02 X2 bla 2
5 2 2008-03-03 X2 bla 2
I believe you can use transform
by year
with function like size
, mean
, sum
:
df['Date'] = pd.to_datetime(df['Date'])
df['Vals'] = df.groupby(['CUSIP', df['Date'].dt.year])['X'].transform('size')
print (df)
Identifier Date CUSIP X Vals
0 1 2013-01-01 X1 bla 5
1 1 2013-01-02 X1 bla 5
2 1 2013-04-03 X1 bla 5
3 1 2013-04-04 X1 bla 5
4 1 2013-05-05 X1 bla 5
5 2 2008-01-01 X2 bla 4
6 2 2008-01-02 X2 bla 4
7 2 2008-03-03 X2 bla 4
8 2 2008-03-04 X2 bla 4
This is how I'd do it, it looks complicated but actually it's not much, I'm just explaining the steps.
Starting with a dataframe like this:
Identifier CUSIP X Date
0 1 X1 bla 2013-01-01
1 1 X1 bla 2013-01-02
2 1 X1 bla 2013-01-03
3 2 X2 bla 2008-01-01
4 2 X2 bla 2008-01-02
5 2 X2 bla 2008-01-03
Add a year column with df['year'] = df.Date.dt.year
Identifier CUSIP X Date year
0 1 X1 bla 2013-01-01 2013
1 1 X1 bla 2013-01-02 2013
2 1 X1 bla 2013-01-03 2013
3 2 X2 bla 2008-01-01 2008
4 2 X2 bla 2008-01-02 2008
5 2 X2 bla 2008-01-03 2008
Then take your pivot table and stack it. (Understanding stack/unstack will greatly help you if you work with pivot tables)
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
>>> piv.stack()
CUSIP
X1 2008 1
2009 1
2010 1
2011 1
2012 1
2013 1
X2 2008 2
2009 2
2010 2
2011 2
2012 2
2013 2
Then you need to reindex by CUSIP and year so that the values are in the same order as your dataframe.
>>> piv.stack().reindex(df[['CUSIP', 'year']])
CUSIP
X1 2013 1
2013 1
2013 1
X2 2008 2
2008 2
2008 2
dtype: int64
All together:
>>> df['pivot_values'] = piv.stack().reindex(df[['CUSIP', 'year']]).values
>>> df
Identifier CUSIP X Date year pivot_values
0 1 X1 bla 2013-01-01 2013 1
1 1 X1 bla 2013-01-02 2013 1
2 1 X1 bla 2013-01-03 2013 1
3 2 X2 bla 2008-01-01 2008 2
4 2 X2 bla 2008-01-02 2008 2
5 2 X2 bla 2008-01-03 2008 2
Assume my dataframe is df
df
CUSIP Date ID X
0 X1 01-01-2013 1 bla
1 X1 01-02-2013 1 bla
2 X1 01-03-2013 1 bla
3 X2 01-01-2008 2 bla
4 X2 01-02-2008 2 bla
5 X2 01-03-2008 2 bla
And pivot table is pv
pv
2008 2009 2010 2011 2012 2013
CUSIP
X1 1 1 1 1 1 1
X2 2 2 2 2 2 2
Solution
Since your dates are just strings, I'll pass them through pd.to_datetime
. I'll also ensure pv
s columns are integers
df.assign(
PV_Values=
pv.rename(columns=int).lookup(
df.CUSIP, pd.to_datetime(df.Date).dt.year
)
)
CUSIP Date ID X PV_Values
0 X1 01-01-2013 1 bla 1
1 X1 01-02-2013 1 bla 1
2 X1 01-03-2013 1 bla 1
3 X2 01-01-2008 2 bla 2
4 X2 01-02-2008 2 bla 2
5 X2 01-03-2008 2 bla 2
Note
If pv
columns were already int
and df.Date
were already datetime
, this would simply be:
df.assign(PV_Values=pv.lookup(df.CUSIP, df.Date.dt.year))
CUSIP Date ID X PV_Values
0 X1 01-01-2013 1 bla 1
1 X1 01-02-2013 1 bla 1
2 X1 01-03-2013 1 bla 1
3 X2 01-01-2008 2 bla 2
4 X2 01-02-2008 2 bla 2
5 X2 01-03-2008 2 bla 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.