Let's say I have a DataFrame like this:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
I want to group this dataset by id
.
For each group, I want to add a new row to it, with the date being be 1 year from now. This row should only be added IF it is later than the last date in the group. The row's val should be the same as the last row in the group.
The final table should look like this:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50 <-- new row
The current code is below. I can get a mask showing which groups need a row appended, but not sure what to do next.
>>> df = pd.DataFrame(data={'d': [datetime.date(2017, 1, 1), datetime.date(2019,1,1), datetime.date(2017,1,1)], 'id': [1,1,2], 'val': [10,20,50]})
>>> df = df.sort_values(by='d')
>>> future_date = (pd.datetime.now().date() + pd.DateOffset(years=1)).date()
>>> maxd = df.groupby('id')['d'].max()
>>> maxd < future_date
id
1 False
2 True
Name: d, dtype: bool
Here's one way
In [3481]: def add_row(x):
...: next_year = pd.to_datetime('today') + pd.DateOffset(years=1)
...: if x['date'].max() < next_year:
...: last_row = x.iloc[-1]
...: last_row['date'] = next_year
...: return x.append(last_row)
...: return x
...:
In [3482]: df.groupby('id').apply(add_row).reset_index(drop=True)
Out[3482]:
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50
You can use idxmax
with loc
for rows with max
date
:
future_date = pd.to_datetime('today') + pd.DateOffset(years=1)
maxd = df.loc[df.groupby('id')['d'].idxmax()]
maxd = maxd[maxd['d'] < future_date]
maxd['d'] = future_date
print (maxd)
d id val
2 2018-09-25 2 50
df = pd.concat([df, maxd]).sort_values(['id','d']).reset_index(drop=True)
print (df)
d id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-25 2 50
A different way to look at it, use duplicated
to find last row per 'id'
t = df[~df.duplicated('id', 'last')]
df.append(
t.assign(
date=pd.to_datetime('today') + pd.DateOffset(years=1)
).pipe(lambda d: d[d.date > t.date]),
ignore_index=True).sort_values(['id', 'date'])
date id val
0 2017-01-01 1 10
1 2019-01-01 1 20
2 2017-01-01 2 50
3 2018-09-24 2 50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.