Pandas: add row to each group depending on condition

Question

Let's say I have a DataFrame like this:

         date  id  val
0  2017-01-01   1   10
1  2019-01-01   1   20
2  2017-01-01   2   50

I want to group this dataset by id .
For each group, I want to add a new row to it, with the date being be 1 year from now. This row should only be added IF it is later than the last date in the group. The row's val should be the same as the last row in the group.

The final table should look like this:

         date  id  val
0  2017-01-01   1   10
1  2019-01-01   1   20
2  2017-01-01   2   50
3  2018-09-25   2   50   <-- new row

The current code is below. I can get a mask showing which groups need a row appended, but not sure what to do next.

>>> df = pd.DataFrame(data={'d': [datetime.date(2017, 1, 1), datetime.date(2019,1,1), datetime.date(2017,1,1)], 'id': [1,1,2], 'val': [10,20,50]})
>>> df = df.sort_values(by='d')
>>> future_date = (pd.datetime.now().date() + pd.DateOffset(years=1)).date()
>>> maxd = df.groupby('id')['d'].max()
>>> maxd < future_date
id
1    False
2     True
Name: d, dtype: bool

Answer 1

Here's one way

In [3481]: def add_row(x):
      ...:     next_year = pd.to_datetime('today') + pd.DateOffset(years=1)
      ...:     if x['date'].max() < next_year:
      ...:         last_row = x.iloc[-1]
      ...:         last_row['date'] = next_year
      ...:         return x.append(last_row)
      ...:     return x
      ...:

In [3482]: df.groupby('id').apply(add_row).reset_index(drop=True)
Out[3482]:
        date  id  val
0 2017-01-01   1   10
1 2019-01-01   1   20
2 2017-01-01   2   50
3 2018-09-25   2   50

Answer 2

You can use idxmax with loc for rows with max date :

future_date = pd.to_datetime('today') + pd.DateOffset(years=1)
maxd = df.loc[df.groupby('id')['d'].idxmax()]

maxd = maxd[maxd['d'] < future_date]
maxd['d'] = future_date
print (maxd)
           d  id  val
2 2018-09-25   2   50

df = pd.concat([df, maxd]).sort_values(['id','d']).reset_index(drop=True)
print (df)
           d  id  val
0 2017-01-01   1   10
1 2019-01-01   1   20
2 2017-01-01   2   50
3 2018-09-25   2   50

Answer 3

A different way to look at it, use duplicated to find last row per 'id'

t = df[~df.duplicated('id', 'last')]
df.append(
    t.assign(
        date=pd.to_datetime('today') + pd.DateOffset(years=1)
    ).pipe(lambda d: d[d.date > t.date]),
    ignore_index=True).sort_values(['id', 'date'])

        date  id  val
0 2017-01-01   1   10
1 2019-01-01   1   20
2 2017-01-01   2   50
3 2018-09-24   2   50

Pandas: add row to each group depending on condition

Question

3 answers

solution1
4 ACCPTED 2017-09-25 06:31:26

solution2
2 2017-09-25 06:54:24

solution3
2 2017-09-25 06:55:30

Pandas: add row to each group depending on condition

Question

3 answers

solution1 4 ACCPTED 2017-09-25 06:31:26

solution2 2 2017-09-25 06:54:24

solution3 2 2017-09-25 06:55:30

solution1
4 ACCPTED 2017-09-25 06:31:26

solution2
2 2017-09-25 06:54:24

solution3
2 2017-09-25 06:55:30