简体   繁体   English

从 Pandas Dataframe 根据日期删除行

[英]Drop Rows From Pandas Dataframe According To Date

I'm trying to create a dataframe with pandas and drop dates later than say 201702. The dataframe is structured as so我正在尝试使用 pandas 创建一个 dataframe 并且删除日期晚于 201702。dataframe 的结构如下

    Date         Account Number
1   2019-02-21   123841234
2   2017-01-01   193741927
3   2015-03-04   981237432
4   2018-05-29   134913473
5   2012-05-12   138749173
6   2009-01-04   174917239

I'm reading in the csv (data.csv) and attempting to remove any date after 2017-02-28 as so:我正在阅读 csv (data.csv) 并尝试删除 2017-02-28 之后的任何日期,如下所示:

data_csv = pd.read_csv('data.csv')
data_csv[data_csv.DATE < '2017-02-28']

Is this supposed to work correctly with a date format of YYYY-MM-DD or is there something I'd have to do to the column format to ensure that these rows are dropped from the dataframe?这是否应该与 YYYY-MM-DD 的日期格式一起正常工作,或者我必须对列格式做些什么来确保这些行从 dataframe 中删除?

Thank you for your time.感谢您的时间。

I suggest you transform the string into a Timestamp , assuming data_csv.Date is also a Timestamp :我建议您将字符串转换为Timestamp ,假设data_csv.Date也是Timestamp

result = data_csv[data_csv.Date < pd.to_datetime('2017-02-28')]
print(result)

Output Output

        Date  Account Number
1 2017-01-01       193741927
2 2015-03-04       981237432
4 2012-05-12       138749173
5 2009-01-04       174917239

If your date strings are in YYYY-MM-DD format then lexicographical comparisons work out of the box (for python in general, not just pandas).如果您的日期字符串是 YYYY-MM-DD 格式,那么字典比较可以开箱即用(对于 python,一般来说,不仅仅是 pandas)。

'2009-01-04' < '2017-02-28'  
# True

'2019-01-04' < '2017-02-28'
# False

So your comparison should work out without any changes.所以你的比较应该没有任何变化。 Although it'd be safer to convert to datetime first, so your code makes no assumptions while still working.虽然首先转换为日期时间会更安全,但您的代码在仍然工作时不做任何假设。


df.dtypes     

Date              object
Account Number     int64
dtype: object

df[df['Date'] < '2017-02-28']

         Date  Account Number
2  2017-01-01       193741927
3  2015-03-04       981237432
5  2012-05-12       138749173
6  2009-01-04       174917239

df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
df.dtypes

Date              datetime64[ns]
Account Number             int64
dtype: object

df[df['Date'] < '2017-02-28']

        Date  Account Number
2 2017-01-01       193741927
3 2015-03-04       981237432
5 2012-05-12       138749173
6 2009-01-04       174917239

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM