简体   繁体   中英

pandas: add new column based on datetime index lookup of same dataframe

I have the following data that I would like to add a new column to that is the current month-over-month percent change. The date is the index in my dataframe

    date    close
1/26/1990   421.2999878
1/29/1990   418.1000061
1/30/1990   410.7000122
1/31/1990   415.7999878
2/23/1990   419.5
2/26/1990   421
2/27/1990   422.6000061
2/28/1990   425.7999878
3/26/1990   438.7999878
3/27/1990   439.5
3/28/1990   436.7000122
3/29/1990   435.3999939
3/30/1990   435.5

The simplest way I could think to do this is to add a column that will contain the previous month end date and also for convenience, the previous month-end 'close' value - and from that I can calculate the current month-over-month change. So in the end, i would have a table that looks like this:

在此处输入图像描述

I was able to add the previous month end just fine, but I am having problems now trying to look up the previous month-end close based on the previous month end date. In the the code below, the first line works fine to add the previous month end date as a new column. But the second does not - the idea is to use the prev_month_end date to look up the the month end close value and add that as a column.

df['prev_month_end'] = df.index + pd.offsets.BMonthEnd(-1)
df['prev_month_close'] = df[df.index == df['prev_month_end']]

Any help or suggestions on how to accomplish this would be appreciated.

You can have prev_month_close as follows:

df.reset_index(inplace=True)
df = df[['date', 'close', 'prev_month_end']].merge(df[['date', 'close']].rename(columns={'close': 'prev_month_close',
                                                                                         'date': 'prev_month_end'}),
                                                    how='left', on='prev_month_end')

OUTPUT

             date       close prev_month_end  prev_month_close
    0  1990-01-26  421.299988     1989-12-29               NaN
    1  1990-01-29  418.100006     1989-12-29               NaN
    2  1990-01-30  410.700012     1989-12-29               NaN
    3  1990-01-31  415.799988     1989-12-29               NaN
    4  1990-02-23  419.500000     1990-01-31        415.799988
    5  1990-02-26  421.000000     1990-01-31        415.799988
    6  1990-02-27  422.600006     1990-01-31        415.799988
    7  1990-02-28  425.799988     1990-01-31        415.799988
    8  1990-03-26  438.799988     1990-02-28        425.799988
    9  1990-03-27  439.500000     1990-02-28        425.799988
    10 1990-03-28  436.700012     1990-02-28        425.799988
    11 1990-03-29  435.399994     1990-02-28        425.799988
    12 1990-03-30  435.500000     1990-02-28        425.799988

OR without using reset_index

df = df[['close', 'prev_month_end']].merge(df[['close']].rename(columns={'close': 'prev_month_close'}),
                                                    how='left', left_on='prev_month_end', right_index=True)

OUTPUT

                 close prev_month_end  prev_month_close
date                                                   
1990-01-26  421.299988     1989-12-29               NaN
1990-01-29  418.100006     1989-12-29               NaN
1990-01-30  410.700012     1989-12-29               NaN
1990-01-31  415.799988     1989-12-29               NaN
1990-02-23  419.500000     1990-01-31        415.799988
1990-02-26  421.000000     1990-01-31        415.799988
1990-02-27  422.600006     1990-01-31        415.799988
1990-02-28  425.799988     1990-01-31        415.799988
1990-03-26  438.799988     1990-02-28        425.799988
1990-03-27  439.500000     1990-02-28        425.799988
1990-03-28  436.700012     1990-02-28        425.799988
1990-03-29  435.399994     1990-02-28        425.799988
1990-03-30  435.500000     1990-02-28        425.799988
             

We can convert the index to period index , then group the dataframe by period and aggregate close using last , then shift the period index one month back and map it with the closing values, finally calculate the percent change

i = pd.to_datetime(df.index).to_period('M')
s = i.shift(-1).map(df.groupby(i)['close'].last())
df['mom_pct_change'] = df['close'].sub(s).div(s).mul(100)

                close  mom_pct_change
date                                 
1/26/1990  421.299988             NaN
1/29/1990  418.100006             NaN
1/30/1990  410.700012             NaN
1/31/1990  415.799988             NaN
2/23/1990  419.500000        0.889854
2/26/1990  421.000000        1.250604
2/27/1990  422.600006        1.635406
2/28/1990  425.799988        2.405002
3/26/1990  438.799988        3.053077
3/27/1990  439.500000        3.217476
3/28/1990  436.700012        2.559893
3/29/1990  435.399994        2.254581
3/30/1990  435.500000        2.278068

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM