[英]Pandas - Assign value to row if matching condition and a time range defined in other column
[英]Pandas: Assign value in column based on time range of the respective row (no header)
我正在将没有 header 的 csv 文件转换为 dataframe。 我正在使用usecols
过滤列选择并声明header=None
。
import pandas as pd
path = r"data.csv"
data = pd.read_csv(path, usecols=[0, 1, 3, 4], header=None)
df = pd.DataFrame(data)
样本数据:
{0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}
之后,我需要为working shift
数据添加新的空白列,这取决于column index [1]
中的time
数据。
这里是time
和working shift
之间的关系:
19500hrs to 0749hrs = 'PM' shift
07500hrs to 1949hrs = 'AM' shift
问题是,如何使用时间范围在working shift
列中分配数据? 这就是我目前正在做的事情:
ShiftID = pd.Series([], dtype=pd.StringDtype())
df[1] = pd.to_datetime(df[1])
start = datetime.strptime('19:50:00', '%H:%M:%S').time()
end = datetime.strptime('07:49:59', '%H:%M:%S').time()
for i in range(len(df)):
if df[1].dt.time.between(start, end): # <------- I'm confuse right here
ShiftID[i]= 'PM'
else:
ShiftID[i]= 'AM'
df.insert(2, "ShiftID", ShiftID) # <--- insert 'working shift' column into index [2]
display(df)
我想要的 output 是这样的:
0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
2: {0: 'PM', 1: 'PM', 2: 'PM', 3: 'PM', 4: 'PM'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}
你应该看看pandas 的 cut 方法。
如上所述,没有原始数据,很难提出代码,但这样的事情应该接近您的解决方案(替换 <> 之间的值):
df["Working Shift"] = df.cut(df[1], bins=[<00:00>, <begin>, <end>, <23:59>], labels=["AM", "PM", "AM"])
尝试这个:
values = np.where(df[1].dt.time.between(start, end), 'PM', 'AM')
ShiftID = pd.Series(values, df.index)
或者
ShiftID = pd.Series('PM', df.index)
ShiftID.where(df[1].dt.time.between(start, end), 'AM')
更新
import pandas as pd
import numpy as np
df = pd.DataFrame.from_dict({0: {0: '2022-08-06',
1: '2022-08-06',
2: '2022-08-06',
3: '2022-08-06',
4: '2022-08-06'},
1: {0: '07:35:16',
1: '07:35:22',
2: '07:35:29',
3: '07:35:36',
4: '07:35:42'},
3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}})
am_start = pd.Timedelta(hours=7, minutes=50)
am_stop = pd.Timedelta(hours=19, minutes=49)
s = pd.to_timedelta(df[1])
df[2] = np.where(s.between(am_start, am_stop), 'AM', 'PM')
df.sort_index(axis=1, inplace=True)
这个也可以工作:
import numpy as np
condition1 = [(df[0].dt.time >= start) & (df[0].dt.time <= end)]
df['ShiftID'] = np.where(condition1, "PM","AM")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.