Pandas: Group by Name and Take Row With Most Recent Date

Question

I have a Pandas DataFrame with columns A, B, C, D, date . I want to filter out duplicates of A and B , keeping the row with the most recent value in date .

So if I have two rows that look like:

A B C D date

1 1 2 3 1/1/18

1 1 2 3 1/1/17

The correct output would be:

A B C D date

1 1 2 3 1/1/18

I can do this by looping through, but I'd like to use df.groupby(['A', 'B']) and then aggregate by taking the largest value for date in each group.

I tried:

df.groupby(['A', 'B']).agg(lambda x: x.iloc[x.date.argmax()])

But I get:

AttributeError: 'Series' object has no attribute 'date'

Any idea what I'm doing incorrectly?

Edit: Hmm if I do:

df.groupby(['A', 'B']).UPDATED_AT.max()

I get mostly what I want but I lose columns D and C ...

Answer 1

You can do with

df.date=pd.to_datetime(df.date)
df.sort_values('date').drop_duplicates(['A','B'],keep='last')
   A  B  C  D       date
0  1  1  2  3 2018-01-01

Answer 2

Try df.groupby(['A', 'B']).agg(lambda x: x.iloc[x['date'].argmax()]) pandas has its own date object. Maybe pandas got confused with the series name.

Answer 3

df = pd.DataFrame([[1, 1, 2, 3, '1/1/18'],
                    [1, 1, 2, 3, '1/1/17']], 
                    columns=['A', 'B', 'C', 'D', 'date'])

Output:

    A   B   C   D   date
0   1   1   2   3   1/1/18
1   1   1   2   3   1/1/17

Grouping an d removing duplicate:

df.groupby(['A', 'B']).agg(
    {
        'date': 'max'
    })

Output:

        date
A   B   
1   1   1/1/18

This should work. It may work better with having 'date' column to be datetime object.

Pandas: Group by Name and Take Row With Most Recent Date

Question

3 answers

solution1
1 2018-12-22 01:38:13

solution2
0 2018-12-22 01:21:05

solution3
0 2018-12-22 01:37:51

Pandas: Group by Name and Take Row With Most Recent Date

Question

3 answers

solution1 1 2018-12-22 01:38:13

solution2 0 2018-12-22 01:21:05

solution3 0 2018-12-22 01:37:51

solution1
1 2018-12-22 01:38:13

solution2
0 2018-12-22 01:21:05

solution3
0 2018-12-22 01:37:51