[英]Comparing 2 datetime64[ns] dataframe columns
I have two date columns namely date1 and date2. 我有两个日期列,即date1和date2。 I am trying to select rows which have date1 later than date2 I tried to 我想选择date1晚于date2的行
print df[df.loc[df['date1']>df['date2']]]
but I recieved an error 但我收到一个错误
ValueError: Boolean array expected for the condition, not float64
In either case, the idea is to retrieve a boolean mask. 无论哪种情况,其想法都是检索布尔掩码。 This boolean mask will then be used to index into the dataframe and retrieve corresponding rows. 然后,该布尔掩码将用于索引数据框并检索相应的行。 First, generate a mask: 首先,生成一个遮罩:
mask = df['date1'] > df['date2']
Now, use this mask to index df
: 现在,使用此掩码为df
编制索引:
df = df.loc[mask]
This can be written in a single line. 这可以写在一行中。
df = df.loc[df['date1'] > df['date2']]
You do not need to perform another level of indexing after this, df
now has your final result. 在此之后,您无需执行其他级别的索引编制, df
现在具有最终结果。 I recommend loc
if you are planning to perform operations and reassignment on this filtered dataframe, because loc
always returns a copy, while plain indexing returns a view. 如果您打算对此过滤后的数据框执行操作和重新分配,则建议使用loc
,因为loc
总是返回一个副本,而普通索引返回一个视图。
Below are some more methods of doing the same thing: 以下是做同一件事的更多方法:
Option 1 选项1
df.query
df.query('date1 > date2')
Option 2 选项2
df.eval
df[df.eval('date1 > date2')]
If your columns are not dates, you might as well cast them now. 如果您的栏不是日期,则最好立即进行转换。 Use pd.to_datetime
: 使用pd.to_datetime
:
df.date1 = pd.to_datetime(df.date1)
df.date2 = pd.to_datetime(df.date2)
Or, when loading your CSV, make sure to set the parse_dates
switch on: 或者,在加载CSV时,请确保将parse_dates
设置为打开:
df = pd.read_csv(..., parse_dates=['date1, date2'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.