简体   繁体   中英

column with previous year value python

I have an example data frame

import pandas as pd

df = pd.DataFrame({'COTA':['A','A','A','A','A','B','B','B','B'],
       'Date':['14/10/2021','19/10/2020','29/10/2019','30/09/2021','20/09/2020','20/10/2021','29/10/2020','15/10/2019','10/09/2020'],         
       'Mark':[1,2,3,4,5,1,2,3,3]
      })
print(df)

For each "COTA" I have a date and a "mark" I need to create a new column with the highest value of the "COTA" of the previous year regardless of the month.

Example

COTA        Date     Mark         Max_MarkLastYear
0   A    14/10/2021     1                5
1   A    19/10/2020     2                3
2   A    29/10/2019     3              NaN
3   A    30/09/2021     4                5
4   A    20/09/2020     5                3
5   B    20/10/2021     1                2
6   B    29/10/2020     2                3
7   B    15/10/2019     3              NaN
8   B    10/09/2020     3                3

in line 0 "COTA" = A, the "Mark" = 1 and the highest "Mark" of the previous year (2020)is 5

How can I create this new column

If need match by previous years use:

df['Date'] = pd.to_datetime(df['Date'])
df['LastYear'] = df['Date'] - pd.offsets.YearEnd(0)

s1 = df.groupby(['Found', 'LastYear'])['Mark'].max()
s2 = s1.rename(index=lambda x: x + pd.offsets.DateOffset(years=1), level=1)

df = df.join(s2.rename('Max_MarkLastYear'), on=['Found', 'LastYear'])
print (df)
  Found       Date  Mark   LastYear  Max_MarkLastYear
0     A 2021-10-14     1 2021-12-31               5.0
1     A 2020-10-19     2 2020-12-31               3.0
2     A 2019-10-29     3 2019-12-31               NaN
3     A 2021-09-30     4 2021-12-31               5.0
4     A 2020-09-20     5 2020-12-31               3.0
5     B 2021-10-20     1 2021-12-31               3.0
6     B 2020-10-29     2 2020-12-31               3.0
7     B 2019-10-15     3 2019-12-31               NaN
8     B 2020-10-09     3 2020-12-31               3.0

If need match by previous month and year use:

df['Date'] = pd.to_datetime(df['Date'])
df['LastYear'] = pd.to_datetime(df['Date']).dt.to_period('m')

s1 = df.groupby(['Found', 'LastYear'])['Mark'].max()
s2 = s1.rename(index=lambda x: x + 12, level=1)

df = df.join(s2.rename('Max_MarkLastYear'), on=['Found', 'LastYear'])
print (df)
  Found       Date  Mark LastYear  Max_MarkLastYear
0     A 2021-10-14     1  2021-10               2.0
1     A 2020-10-19     2  2020-10               3.0
2     A 2019-10-29     3  2019-10               NaN
3     A 2021-09-30     4  2021-09               5.0
4     A 2020-09-20     5  2020-09               NaN
5     B 2021-10-20     1  2021-10               3.0
6     B 2020-10-29     2  2020-10               3.0
7     B 2019-10-15     3  2019-10               NaN
8     B 2020-10-09     3  2020-10               3.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM