比较2 datetime64 [ns]数据框列

Question

I have two date columns namely date1 and date2. 我有两个日期列，即date1和date2。 I am trying to select rows which have date1 later than date2 I tried to 我想选择date1晚于date2的行

print df[df.loc[df['date1']>df['date2']]]

but I recieved an error 但我收到一个错误

ValueError: Boolean array expected for the condition, not float64

Answer 1

In either case, the idea is to retrieve a boolean mask. 无论哪种情况，其想法都是检索布尔掩码。 This boolean mask will then be used to index into the dataframe and retrieve corresponding rows. 然后，该布尔掩码将用于索引数据框并检索相应的行。 First, generate a mask: 首先，生成一个遮罩：

mask = df['date1'] > df['date2']

Now, use this mask to index df : 现在，使用此掩码为df编制索引：

df = df.loc[mask]

This can be written in a single line. 这可以写在一行中。

df = df.loc[df['date1'] > df['date2']]

You do not need to perform another level of indexing after this, df now has your final result. 在此之后，您无需执行其他级别的索引编制， df现在具有最终结果。 I recommend loc if you are planning to perform operations and reassignment on this filtered dataframe, because loc always returns a copy, while plain indexing returns a view. 如果您打算对此过滤后的数据框执行操作和重新分配，则建议使用loc ，因为loc总是返回一个副本，而普通索引返回一个视图。

Below are some more methods of doing the same thing: 以下是做同一件事的更多方法：

Option 1 选项1
df.query

df.query('date1 > date2')

Option 2 选项2
df.eval

df[df.eval('date1 > date2')]

If your columns are not dates, you might as well cast them now. 如果您的栏不是日期，则最好立即进行转换。 Use pd.to_datetime : 使用pd.to_datetime ：

df.date1 = pd.to_datetime(df.date1)
df.date2 = pd.to_datetime(df.date2)

Or, when loading your CSV, make sure to set the parse_dates switch on: 或者，在加载CSV时，请确保将parse_dates设置为打开：

df = pd.read_csv(..., parse_dates=['date1, date2'])

比较2 datetime64 [ns]数据框列

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-10-26 09:34:29

比较2 datetime64 [ns]数据框列

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-10-26 09:34:29

解决方案1
0 已采纳 2017-10-26 09:34:29