简体   繁体   English

如何按特定月/日过滤日期的 dataframe?

[英]How to filter a dataframe of dates by a particular month/day?

So my code is as follows:所以我的代码如下:

 df['Dates'][df['Dates'].index.month == 11]

I was doing a test to see if I could filter the months so it only shows November dates, but this did not work.我正在做一个测试,看看我是否可以过滤月份,所以它只显示 11 月的日期,但这不起作用。 It gives me the following error: AttributeError: 'Int64Index' object has no attribute 'month'.它给了我以下错误: AttributeError: 'Int64Index' object has no attribute 'month'。

If I do如果我做

print type(df['Dates'][0])

then I get class 'pandas.tslib.Timestamp', which leads me to believe that the types of objects stored in the dataframe are Timestamp objects.然后我得到 class 'pandas.tslib.Timestamp',这让我相信存储在 dataframe 中的对象类型是 Timestamp 对象。 (I'm not sure where the 'Int64Index' is coming from... for the error before) (我不确定“Int64Index”是从哪里来的……之前的错误)

What I want to do is this: The dataframe column contains dates from the early 2000's to present in the following format: dd/mm/yyyy.我想做的是:dataframe 列包含从 2000 年初到以下格式的日期:dd/mm/yyyy。 I want to filter for dates only between November 15 and March 15, independent of the YEAR.我只想过滤 11 月 15 日至 3 月 15 日之间的日期,与年份无关。 What is the easiest way to do this?最简单的方法是什么?

Thanks.谢谢。

Here is df['Dates'] (with indices):这是 df['Dates'] (带索引):

 0 2006-01-01 1 2006-01-02 2 2006-01-03 3 2006-01-04 4 2006-01-05 5 2006-01-06 6 2006-01-07 7 2006-01-08 8 2006-01-09 9 2006-01-10 10 2006-01-11 11 2006-01-12 12 2006-01-13 13 2006-01-14 14 2006-01-15...

Using pd.to_datetime & dt accessor使用pd.to_datetime & dt访问器

The accepted answer is not the "pandas" way to approach this problem.公认的答案不是解决这个问题的“熊猫”方式。 To select only rows with month 11 , use the dt accessor:对于 select 仅包含month 11的行,使用dt访问器:

 # df['Date'] = pd.to_datetime(df['Date']) -- if column is not datetime yet df = df[df['Date'].dt.month == 11]

Same works for days or years, where you can substitute dt.month with dt.day or dt.year几天或几年同样有效,您可以用dt.daydt.year替换dt.month

Besides that, there are many more, here are a few:除此之外,还有更多,这里有一些:

  • dt.quarter
  • dt.week
  • dt.weekday
  • dt.day_name
  • dt.is_month_end
  • dt.is_month_start
  • dt.is_year_end
  • dt.is_year_start

For a complete list see the documentation有关完整列表,请参阅文档

Map an anonymous function to calculate the month on to the series and compare it to 11 for nov. Map 和匿名 function 来计算该系列的月份并将其与 11 月的 11 进行比较。 That will give you a boolean mask.这会给你一个 boolean 掩码。 You can then use that mask to filter your dataframe.然后,您可以使用该掩码过滤您的 dataframe。

 nov_mask = df['Dates'].map(lambda x: x.month) == 11 df[nov_mask]

I don't think there is straight forward way to filter the way you want ignoring the year so try this.我认为没有直接的方法来过滤你想要忽略年份的方式,所以试试这个。

 nov_mar_series = pd.Series(pd.date_range("2013-11-15", "2014-03-15")) #create timestamp without year nov_mar_no_year = nov_mar_series.map(lambda x: x.strftime("%m-%d")) #add a yearless timestamp to the dataframe df["no_year"] = df['Date'].map(lambda x: x.strftime("%m-%d")) no_year_mask = df['no_year'].isin(nov_mar_no_year) df[no_year_mask]

In your code there are two issues.在您的代码中有两个问题。 First, need to bring column reference after the filtering condition.首先,需要在过滤条件之后带上列引用。 Second, can either use ".month" with a column or index, but not both.其次,可以将“.month”与列或索引一起使用,但不能同时使用。 One of the following should work:以下其中一项应该有效:

 df[df.index.month == 11]['Dates'] df[df['Dates'].month == 11]['Dates']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM