[英]Create new column based on condition of another column if value falls in a range
我有 df 例如:
Hour
12:00pm
12:00am
3:00pm
2:00pm
11:00pm
Continued....
我想创建一个新列,根据条件提供该时间段,例如,如果在上午 6:00 到 11:59 之间,则默认,如果在下午 12:00 到下午 3:59 之间,则及时,如果在下午 4:00 到 11 之间:59pm 然后迟到,如果在 12:00am 和 5:59am 之间则无效。
想要使用类似下面的代码:
def func(row):
if row['Hour'] >= 06:00am & < 12:00pm:
return 'defualt'
elif row['Hour'] >= 12:00pm & < 04:00pm:
return 'timley'
elif row['Hour'] >= 04:00pm & < 12:00am:
return 'late'
elif row['Hour'] >= 12:00am & < 06:00am:
return 'nonvalid'
else:
return 'other'
df['Segment'] = df.apply(func, axis=1)
但是小时列不是日期时间,所以不确定它是否会读取我函数中的范围。
Expected output:
Hour Segment
12:00pm timley
12:00am nonvalid
3:00pm timley
2:00pm timley
11:00pm late
我认为这里有必要转换 bins 和列值并传递给cut
:
dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
Hour new
0 12:00pm timely
1 12:00am nonvalid
2 3:00pm timely
3 2:00pm timely
4 11:00pm late
测试更多日期:
df = pd.DataFrame({'Hour': pd.date_range('2020-01-01', periods=24, freq='H')})
df['Hour'] = df['Hour'].dt.strftime('%I:%M%p')
#print (df)
dates = pd.to_datetime(df['Hour'], format='%I:%M%p')
b = pd.to_datetime(['12:00am','06:00am','12:00pm','04:00pm', '11:59pm'], format='%I:%M%p')
l = ['nonvalid','Default', 'timely','late']
df['new'] = pd.cut(dates, bins=b, labels=l, right=False)
print (df)
Hour new
0 12:00AM nonvalid
1 01:00AM nonvalid
2 02:00AM nonvalid
3 03:00AM nonvalid
4 04:00AM nonvalid
5 05:00AM nonvalid
6 06:00AM Default
7 07:00AM Default
8 08:00AM Default
9 09:00AM Default
10 10:00AM Default
11 11:00AM Default
12 12:00PM timely
13 01:00PM timely
14 02:00PM timely
15 03:00PM timely
16 04:00PM late
17 05:00PM late
18 06:00PM late
19 07:00PM late
20 08:00PM late
21 09:00PM late
22 10:00PM late
23 11:00PM late
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.