简体   繁体   中英

Pandas - min of a column for each value in other

I have a CSV file as follows:

Date, Name
2015-01-01 16:30:00.0, John
2015-02-11 16:30:00.0, Doe
2015-03-01 16:30:00.0, Sam
2015-03-05 16:30:00.0, Sam
2015-04-21 16:30:00.0, Chris
2015-05-07 16:30:00.0, John
2015-06-08 16:30:00.0, Doe

You can see that same name is repeated on multiple date. I want to know for each unique name, what is the MAX date in date column. How to do this with Pandas or other solution if you know any in Python?

I want the result like:

Name, Max date(or latest)
John, 2015-01-01 16:30:00.0
Doe, 2015-01-01 16:30:00.0
Sam, 2015-01-01 16:30:00.0
Chris, 2015-01-01 16:30:00.0

You want to do DataFrame.groupby() and then on it call - .max() / .min() (Depending on what you want) . Example -

df.groupby('Name').max()

You would also need to make sure that when you read in the csv, you parse the 'Date' column as datetime, by using the dtype argument for .read_csv() method (as given below in the example).


Example/Demo (For your csv example in Question) -

In [12]: df = pd.read_csv('a.csv',dtype={'Date':pd.datetime,'Name':str})

In [13]: df
Out[13]:
                    Date   Name
0  2015-01-01 16:30:00.0   John
1  2015-02-11 16:30:00.0    Doe
2  2015-03-01 16:30:00.0    Sam
3  2015-03-05 16:30:00.0    Sam
4  2015-04-21 16:30:00.0  Chris
5  2015-05-07 16:30:00.0   John
6  2015-06-08 16:30:00.0    Doe

In [15]: df.groupby(['Name']).max()
Out[15]:
                        Date
Name
Chris  2015-04-21 16:30:00.0
Doe    2015-06-08 16:30:00.0
John   2015-05-07 16:30:00.0
Sam    2015-03-05 16:30:00.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM