I have a CSV file as follows:
Date, Name
2015-01-01 16:30:00.0, John
2015-02-11 16:30:00.0, Doe
2015-03-01 16:30:00.0, Sam
2015-03-05 16:30:00.0, Sam
2015-04-21 16:30:00.0, Chris
2015-05-07 16:30:00.0, John
2015-06-08 16:30:00.0, Doe
You can see that same name is repeated on multiple date. I want to know for each unique name, what is the MAX date in date column. How to do this with Pandas or other solution if you know any in Python?
I want the result like:
Name, Max date(or latest)
John, 2015-01-01 16:30:00.0
Doe, 2015-01-01 16:30:00.0
Sam, 2015-01-01 16:30:00.0
Chris, 2015-01-01 16:30:00.0
You want to do DataFrame.groupby()
and then on it call - .max()
/ .min()
(Depending on what you want) . Example -
df.groupby('Name').max()
You would also need to make sure that when you read in the csv, you parse the 'Date'
column as datetime, by using the dtype
argument for .read_csv()
method (as given below in the example).
Example/Demo (For your csv example in Question) -
In [12]: df = pd.read_csv('a.csv',dtype={'Date':pd.datetime,'Name':str})
In [13]: df
Out[13]:
Date Name
0 2015-01-01 16:30:00.0 John
1 2015-02-11 16:30:00.0 Doe
2 2015-03-01 16:30:00.0 Sam
3 2015-03-05 16:30:00.0 Sam
4 2015-04-21 16:30:00.0 Chris
5 2015-05-07 16:30:00.0 John
6 2015-06-08 16:30:00.0 Doe
In [15]: df.groupby(['Name']).max()
Out[15]:
Date
Name
Chris 2015-04-21 16:30:00.0
Doe 2015-06-08 16:30:00.0
John 2015-05-07 16:30:00.0
Sam 2015-03-05 16:30:00.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.