简体   繁体   中英

Pandas - From list of dates, get the last date in each month

I have a fairly simple question but can't find a clean pandas solution to it.

Given a list of dates in a series like below:

LoadedDate
0   2016-02-18
1   2016-02-19
2   2016-02-20
3   2016-02-23
4   2016-02-24
5   2016-02-25
6   2016-02-26
7   2016-02-27
8   2016-03-01
9   2016-03-02
10  2016-03-03
11  2016-03-04
12  2016-03-05
13  2016-03-08
14  2016-03-09
15  2016-03-10
16  2016-03-11
17  2016-03-12
18  2016-03-15
19  2016-03-16
20  2016-03-17
21  2016-03-18
22  2016-03-19
23  2016-03-22
24  2016-03-23
25  2016-03-24
26  2016-03-25
27  2016-03-30
28  2016-03-31
29  2016-04-01
30  2016-04-02
31  2016-04-05
32  2016-04-06
33  2016-04-07
34  2016-04-08
35  2016-04-09
36  2016-04-12
37  2016-04-13
38  2016-04-14
39  2016-04-15
40  2016-04-16
41  2016-04-19
42  2016-04-20
43  2016-04-21
44  2016-04-22
45  2016-04-23
46  2016-04-27
47  2016-04-28
48  2016-04-29
49  2016-04-30
50  2016-05-02
51  2016-05-03
52  2016-05-04

I'd like to pull the last/max date of each month. So the output would be:

LastDate
0   2016-02-27
1   2016-03-31
2   2016-04-29
3   2016-05-04

I tried df.set_index('LoadedDate').groupby(pd.Grouper(freq='M')).max() but it returned the max calendar date, not the actual max loaded date of my series.

Thanks.

You could use

In [300]: df.groupby(df.LoadedDate.astype('datetime64[M]')).last().reset_index(drop=True)
Out[300]:
  LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04

Or,

In [295]: df.groupby(df.LoadedDate - pd.offsets.MonthEnd()).last().reset_index(drop=True)
Out[295]:
  LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04

Or,

In [301]: df.groupby(df.LoadedDate.dt.to_period('M')).last().reset_index(drop=True)
Out[301]:
  LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04

Or,

In [303]: df.groupby(df.LoadedDate.astype(str).str[:7]).last().reset_index(drop=True)
Out[303]:
  LoadedDate
0 2016-02-27
1 2016-03-31
2 2016-04-30
3 2016-05-04

If the dates are not sorted. Using any of the above methods use idxmax and loc

In [307]: df.loc[df.groupby(df.LoadedDate.astype(str).str[:7]).LoadedDate.idxmax().values]
Out[307]:
   LoadedDate
7  2016-02-27
28 2016-03-31
49 2016-04-30
52 2016-05-04

You can try following code:

Create a new column:

df['new_loadeddate']=df['LoadedDate'].apply(lambda date : date[:-3])

now group by month:

grouped_df=df.groupby('new_loadeddate').max()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM