根据日期条件从 dataframe 中删除记录

Question

I have dataframe with a column named date, it contains the following dates:我有一个名为日期的列 dataframe，它包含以下日期：

In [67]: df.date.drop_duplicates()
Out[67]: 
0      2020-02-04
570    2020-02-19
1157   2020-03-03
1791   2020-04-02
2452   2020-04-08
3113   2020-05-05
3777   2020-06-03
4445   2020-07-02
5131   2020-08-04
Name: date, dtype: datetime64[ns]

I only want to have monthly data.我只想拥有每月数据。 And from the monthly data I want to keep the earliest of this month.从每月数据来看，我想保留本月最早的数据。 So here I want to delete all records where the date is 2020-02-19 and 2020-04-08.所以在这里我想删除日期为 2020-02-19 和 2020-04-08 的所有记录。 The problem is that I never know which dates I will recieve.问题是我永远不知道我会收到哪些日期。 I could have also recieved 2020-07-22 as well - then I would have wanted to delete all records with date 2020-07-22 too as I alreaedy have 2020-07-02.我也可以收到 2020-07-22 - 然后我也想删除日期为 2020-07-22 的所有记录，因为我已经有 2020-07-02。

Do you know a smooth way to do that?你知道一个平滑的方法吗？ I thought of sorting the values in a way that they look like this:我想以它们看起来像这样的方式对值进行排序：

Then I could delete all records where the date is one of the dates after the 7th row (counted from 1), as I always have a variable that determines how many datapoints I need.然后我可以删除日期是第 7 行之后的日期之一的所有记录（从 1 开始计数），因为我总是有一个变量来确定我需要多少数据点。 But I couldn't figure out how to sort it like that.但我无法弄清楚如何对它进行排序。 Do you know any other way or could help me sorting the date values?你知道任何其他方式或可以帮助我对日期值进行排序吗？ Thank you so much!太感谢了！

Answer 1

IIUC, you can do a groupby month, and then get the min : groupby ，你可以按月做一个分组，然后得到min ：

df.groupby(df.date.dt.month).min()

If 'date' has more than one year, group by year and month:如果'date'超过一年，则按年和月分组：

df.groupby([df.date.dt.month,df.date.dt.year]).min()

Output: Output：

           date
           
2    2020-02-04
3    2020-03-03
4    2020-04-02
5    2020-05-05
6    2020-06-03
7    2020-07-02
8    2020-08-04

Answer 2

This works even if your data is longer than a year:即使您的数据超过一年，这也有效：

df.sort_values(by='date').groupby(df.date.dt.year.astype(str)
                                  + df.date.dt.month.astype(str)).first()

Output: Output：

0       idx       date
date                  
20202     0 2020-02-04
20203  1157 2020-03-03
20204  1791 2020-04-02
20205  3113 2020-05-05
20206  3777 2020-06-03
20207  4445 2020-07-02
20208  5131 2020-08-04

根据日期条件从 dataframe 中删除记录

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-08-10 16:30:45

解决方案2
1 2020-08-10 16:33:41

根据日期条件从 dataframe 中删除记录

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-08-10 16:30:45

解决方案2 1 2020-08-10 16:33:41

解决方案1
2 已采纳 2020-08-10 16:30:45

解决方案2
1 2020-08-10 16:33:41