简体   繁体   English

比较2 datetime64 [ns]数据框列

[英]Comparing 2 datetime64[ns] dataframe columns

I have two date columns namely date1 and date2. 我有两个日期列,即date1和date2。 I am trying to select rows which have date1 later than date2 I tried to 我想选择date1晚于date2的行

print df[df.loc[df['date1']>df['date2']]]

but I recieved an error 但我收到一个错误

ValueError: Boolean array expected for the condition, not float64

In either case, the idea is to retrieve a boolean mask. 无论哪种情况,其想法都是检索布尔掩码。 This boolean mask will then be used to index into the dataframe and retrieve corresponding rows. 然后,该布尔掩码将用于索引数据框并检索相应的行。 First, generate a mask: 首先,生成一个遮罩:

mask = df['date1'] > df['date2']

Now, use this mask to index df : 现在,使用此掩码为df编制索引:

df = df.loc[mask]

This can be written in a single line. 这可以写在一行中。

df = df.loc[df['date1'] > df['date2']]

You do not need to perform another level of indexing after this, df now has your final result. 在此之后,您无需执行其他级别的索引编制, df现在具有最终结果。 I recommend loc if you are planning to perform operations and reassignment on this filtered dataframe, because loc always returns a copy, while plain indexing returns a view. 如果您打算对此过滤后的数据框执行操作和重新分配,则建议使用loc ,因为loc总是返回一个副本,而普通索引返回一个视图。


Below are some more methods of doing the same thing: 以下是做同一件事的更多方法:

Option 1 选项1
df.query

df.query('date1 > date2')

Option 2 选项2
df.eval

df[df.eval('date1 > date2')]

If your columns are not dates, you might as well cast them now. 如果您的栏不是日期,则最好立即进行转换。 Use pd.to_datetime : 使用pd.to_datetime

df.date1 = pd.to_datetime(df.date1)
df.date2 = pd.to_datetime(df.date2)

Or, when loading your CSV, make sure to set the parse_dates switch on: 或者,在加载CSV时,请确保将parse_dates设置为打开:

df = pd.read_csv(..., parse_dates=['date1, date2'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM