简体   繁体   中英

Merging pivot table with “long-data” dataformat

UPDATE!

Just be aware of changing the year to int in both the df and the pivot table (after unstacking). That caused me some trouble :)

Data for the values:

d = {'ID':[1,1,1,2,2,2],'Date':['01-01-2013','01-02-2013','01-03-2013','01-
01-2008','01-02-2008','01-03-2008'],'CUSIP':
['X1','X1','X1','X2','X2','X2'],'X':['bla','bla','bla','bla','bla','bla']}
df = pd.DataFrame(data=d)

I have a dataframe:

   Identifier CUSIP    X       Date
0           1    X1  bla 2013-01-01
1           1    X1  bla 2013-01-02
2           1    X1  bla 2013-01-03
3           2    X2  bla 2008-01-01
4           2    X2  bla 2008-01-02
5           2    X2  bla 2008-01-03

And a pivot table:

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

And I would like to achieve a layout like:

   Identifier CUSIP    X       Date Values
0           1    X1  bla 2013-01-01 1
1           1    X1  bla 2013-01-02 1
2           1    X1  bla 2013-01-03 1
3           2    X2  bla 2008-01-01 2
4           2    X2  bla 2008-01-02 2
5           2    X2  bla 2008-01-03 2

You can use stack for reshape df2 with join with left join:

#if necessary
df['Date'] = pd.to_datetime(df['Date'])
df['year'] = df.Date.dt.year

df1 = df.join(df1.stack().rename('val'), on=['CUSIP', 'year'])
print (df1)
   Identifier       Date CUSIP    X  year  val
0           1 2013-01-01    X1  bla  2013    1
1           1 2013-01-02    X1  bla  2013    1
2           1 2013-04-03    X1  bla  2013    1
3           2 2008-01-01    X2  bla  2008    2
4           2 2008-01-02    X2  bla  2008    2
5           2 2008-03-03    X2  bla  2008    2

Alternative solution:

df1 = df.join(df1.stack().rename('val'), on=[df['CUSIP'], df['Date'].dt.year])
print (df1)
   Identifier       Date CUSIP    X  val
0           1 2013-01-01    X1  bla    1
1           1 2013-01-02    X1  bla    1
2           1 2013-04-03    X1  bla    1
3           2 2008-01-01    X2  bla    2
4           2 2008-01-02    X2  bla    2
5           2 2008-03-03    X2  bla    2

I believe you can use transform by year with function like size , mean , sum :

df['Date'] = pd.to_datetime(df['Date'])

df['Vals'] = df.groupby(['CUSIP', df['Date'].dt.year])['X'].transform('size')
print (df)
   Identifier       Date CUSIP    X  Vals
0           1 2013-01-01    X1  bla     5
1           1 2013-01-02    X1  bla     5
2           1 2013-04-03    X1  bla     5
3           1 2013-04-04    X1  bla     5
4           1 2013-05-05    X1  bla     5
5           2 2008-01-01    X2  bla     4
6           2 2008-01-02    X2  bla     4
7           2 2008-03-03    X2  bla     4
8           2 2008-03-04    X2  bla     4

This is how I'd do it, it looks complicated but actually it's not much, I'm just explaining the steps.
Starting with a dataframe like this:

   Identifier CUSIP    X       Date
0           1    X1  bla 2013-01-01
1           1    X1  bla 2013-01-02
2           1    X1  bla 2013-01-03
3           2    X2  bla 2008-01-01
4           2    X2  bla 2008-01-02
5           2    X2  bla 2008-01-03

Add a year column with df['year'] = df.Date.dt.year

   Identifier CUSIP    X       Date  year
0           1    X1  bla 2013-01-01  2013
1           1    X1  bla 2013-01-02  2013
2           1    X1  bla 2013-01-03  2013
3           2    X2  bla 2008-01-01  2008
4           2    X2  bla 2008-01-02  2008
5           2    X2  bla 2008-01-03  2008

Then take your pivot table and stack it. (Understanding stack/unstack will greatly help you if you work with pivot tables)

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

>>> piv.stack()
CUSIP      
X1     2008    1
       2009    1
       2010    1
       2011    1
       2012    1
       2013    1
X2     2008    2
       2009    2
       2010    2
       2011    2
       2012    2
       2013    2

Then you need to reindex by CUSIP and year so that the values are in the same order as your dataframe.

>>> piv.stack().reindex(df[['CUSIP', 'year']])
CUSIP      
X1     2013    1
       2013    1
       2013    1
X2     2008    2
       2008    2
       2008    2
dtype: int64

All together:

>>> df['pivot_values'] = piv.stack().reindex(df[['CUSIP', 'year']]).values
>>> df
   Identifier CUSIP    X       Date  year  pivot_values
0           1    X1  bla 2013-01-01  2013             1
1           1    X1  bla 2013-01-02  2013             1
2           1    X1  bla 2013-01-03  2013             1
3           2    X2  bla 2008-01-01  2008             2
4           2    X2  bla 2008-01-02  2008             2
5           2    X2  bla 2008-01-03  2008             2

Assume my dataframe is df

df

  CUSIP        Date  ID    X
0    X1  01-01-2013   1  bla
1    X1  01-02-2013   1  bla
2    X1  01-03-2013   1  bla
3    X2  01-01-2008   2  bla
4    X2  01-02-2008   2  bla
5    X2  01-03-2008   2  bla

And pivot table is pv

pv

       2008  2009  2010  2011  2012  2013
CUSIP                                    
X1        1     1     1     1     1     1
X2        2     2     2     2     2     2

Solution

Use pd.DataFrame.lookup

Since your dates are just strings, I'll pass them through pd.to_datetime . I'll also ensure pv s columns are integers

df.assign(
    PV_Values=
    pv.rename(columns=int).lookup(
        df.CUSIP, pd.to_datetime(df.Date).dt.year
    )
)

  CUSIP        Date  ID    X  PV_Values
0    X1  01-01-2013   1  bla          1
1    X1  01-02-2013   1  bla          1
2    X1  01-03-2013   1  bla          1
3    X2  01-01-2008   2  bla          2
4    X2  01-02-2008   2  bla          2
5    X2  01-03-2008   2  bla          2

Note
If pv columns were already int and df.Date were already datetime , this would simply be:

df.assign(PV_Values=pv.lookup(df.CUSIP, df.Date.dt.year))

  CUSIP        Date  ID    X  PV_Values
0    X1  01-01-2013   1  bla          1
1    X1  01-02-2013   1  bla          1
2    X1  01-03-2013   1  bla          1
3    X2  01-01-2008   2  bla          2
4    X2  01-02-2008   2  bla          2
5    X2  01-03-2008   2  bla          2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM