简体   繁体   English

按特定日期条件过滤 Pandas DataFrame 中的行

[英]Filtering Rows In Pandas DataFrame By Certain Date Criteria

I have a Code which I run in Jupyter Notebook,我有一个在 Jupyter Notebook 中运行的代码,

This is the resulting DataFrame output which I get:-这是我得到的结果 DataFrame output:-

    LOCATION        DATE   DAKOTA HURRI SPITFIRE
MyIdx                   
176 Duxford     10-Jul-2004     D   H   S
177 Cirencester 10-Jul-2004     D   H   S
178 Brize Nortn 10-Jul-2004     D   H   S
74  Shrivenham  10-Jun-2004     D   H   S
257 Campbletown 15-Aug-2004     D   --  S
258 Sunderland  15-Aug-2004     D   --  S
261 Scampton    15-Aug-2004     D   --  S
200 Fairford    15-Jul-2004     D   --  SS
22  Tilford     15-May-2004     D   --  S
23  Abingdon    15-May-2004     D   --  S
24  Hyde Heath  15-May-2004     D   --  S

Could I Moderator tidy the output layout for me, if that is okay?版主可以帮我整理一下 output 布局吗?

These are the two key parts of the Code I am filtering Rows by Date with:-这些是我按日期过滤行的代码的两个关键部分:-

(df3['DATE'].str.contains('-10$|15$'))  

and

display.sort_values(by=['DATE'])

The First line of Code, is to filter the DataFrame Row Output by two days the 10th of the Month and the 15th.第一行代码,是按每月 10 日和 15 日这两天过滤 DataFrame 行 Output。

it correctly outputs the earliest days in the DataFrame Output first, ie 10 before 15, but not in the month order I want:-它首先正确输出 DataFrame Output 中的最早日期,即 15 之前的 10,但不是我想要的月份顺序:-

I want 10th June 2004 first then the 10th of July/s then the 15th of May's then the 15th of July Rows etc. How do I modify that line of Code, so that I can filter to get that order, without changing the index position of the Rows via code, which I know how to do?我首先想要 2004 年 6 月 10 日,然后是 7 月 10 日,然后是 5 月 15 日,然后是 7 月 15 日行等。如何修改该行代码,以便我可以过滤以获得该订单,而无需更改索引 position通过代码的行,我知道该怎么做?

I mean add something to either lines of Code, so that the Earlier month with an the earlier day, is shown 'favoured' before the later month with the same day?我的意思是在两行代码中添加一些内容,以便在较早的月份与较早的日期之前,在同一天的较晚月份之前显示“受青睐”? ie 10-Jun-2004 is shown before 10-Jul-2004, 15-May-2004 is shown before 15-Jul-2004 Rows then.即 2004 年 6 月 10 日在 2004 年 7 月 10 日之前显示,2004 年 5 月 15 日在 2004 年 7 月 15 日之前显示。 But still dates with day 10, showing before day 15 Rows.但仍与第 10 天约会,显示在第 15 天之前的行。

So the Rows shown, are in the Date Order Like this:-所以显示的行是这样的日期顺序: -

10-Jun-2004
10-Jul-2004
15-May-2004
15-Jul-2004
15-Aug-2004

The Date output is from this line of Code:-日期 output 来自这行代码:-

display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')

Any help I could be given, would be much appreciated我可以得到任何帮助,将不胜感激

Best Regards此致

Eddie Winch埃迪·温奇

Consider breaking out columns for day, month, and year separately and then sort on those as needed.考虑分别拆分日、月和年的列,然后根据需要对这些列进行排序。 It'll be easier to use the numeric month for sorting (you could keep the displayed date as you have it if that's how you want to display it).使用数字月份进行排序会更容易(如果你想显示它,你可以保持显示的日期不变)。

Like:喜欢:

import pandas as pd
data = [
{"Name": "Alice", "date": "15-May-2004", "Rating": 55},
{"Name": "Bob", "date": "10-Jun-2004", "Rating": 11},
{"Name": "Chanel", "date": "15-Aug-2004", "Rating": 33},
{"Name": "Del", "date": "10-Jul-2004", "Rating": 44},
{"Name": "Erin", "date": "15-Jul-2004", "Rating": 22},
]
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df['date_day'] = df.apply(lambda row: row.date.day, axis=1)
df['date_month'] = df.apply(lambda row: row.date.month, axis=1)
df['date_year'] = df.apply(lambda row: row.date.year, axis=1)
df = df.sort_values(by=["date_day", "date_month"])

result:结果:

    Name    date        Rating  date_day    date_month  date_year
1   Bob     2004-06-10  11      10          6           2004
3   Del     2004-07-10  44      10          7           2004
0   Alice   2004-05-15  55      15          5           2004
4   Erin    2004-07-15  22      15          7           2004
2   Chanel  2004-08-15  33      15          8           2004

Another approach without adding columns is to use the key argument to sort_values to get the sorting you want:另一种不添加列的方法是使用sort_valueskey参数来获得所需的排序:

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by="date", key=lambda col: 100 * col.dt.day + col.dt.month)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM