[英]Checking multiple fields (string fields and date fields) in a data-frame
I have a dataframe ( df
) that looks like:我有一个数据框(
df
),看起来像:
Id Status Date of entry to current post Date of entry to current payband
1 NEW ENTRANT - EXTERNAL 1/1/2020 1/1/2019
2 CURRENT 1/1/2020 1/1/2020
I am trying to write a validation that returns any records that have a Date of entry to current post
that is before Date of entry to current payband
and the Status
field is a new entrant type (there are a few hence the wildcard).我正在尝试编写一个验证,该验证返回任何记录
Date of entry to current post
Date of entry to current payband
之前Date of entry to current payband
并且Status
字段是新的输入类型(因此有一些通配符)。
I have tried the following without success我尝试了以下但没有成功
df['Date of entry to current post']>df['Date of entry to current payband'] & df['Status'] =='NEW ENTRANT*')
So in this example I would like returned:所以在这个例子中,我想返回:
Id Status Date of entry to current post Date of entry to current payband
1 NEW ENTRANT - EXTERNAL 1/1/2020 1/1/2019
How can I tackle this?我该如何解决这个问题?
If you have datetime columns for your dates, this should work:如果您的日期有日期时间列,这应该有效:
import numpy as np
df['Condition'] = np.where((df['Date of entry to current post']>df['Date of entry to current payband']) & (df['Status'] =='NEW ENTRANT*'), 1, 0)
df = df.loc[df['Condition'] == 1)
You are comparing to the string 'NEW ENTRANT*'
meaning a string actually containing the *
character.您正在与字符串
'NEW ENTRANT*'
进行比较,这意味着字符串实际上包含*
字符。
What you want is:你想要的是:
... & df['Status'].str.match('NEW ENTRANT'))
But if the date columns actually contain strings, you will compare them in lexicographic order which is probably not what you want...但是如果日期列实际上包含字符串,您将按字典顺序比较它们,这可能不是您想要的......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.