繁体   English   中英

Pandas:根据相应行的时间范围在列中分配值(无标题)

[英]Pandas: Assign value in column based on time range of the respective row (no header)

我正在将没有 header 的 csv 文件转换为 dataframe。 我正在使用usecols过滤列选择并声明header=None

import pandas as pd

path = r"data.csv"
data = pd.read_csv(path, usecols=[0, 1, 3, 4], header=None)   
df = pd.DataFrame(data)

样本数据:

{0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}

之后,我需要为working shift数据添加新的空白列,这取决于column index [1]中的time数据。

这里是timeworking shift之间的关系:

19500hrs to 0749hrs = 'PM' shift
07500hrs to 1949hrs = 'AM' shift

问题是,如何使用时间范围在working shift列中分配数据? 这就是我目前正在做的事情:

ShiftID = pd.Series([], dtype=pd.StringDtype())

df[1] = pd.to_datetime(df[1])

start = datetime.strptime('19:50:00', '%H:%M:%S').time()
end = datetime.strptime('07:49:59', '%H:%M:%S').time()

for i in range(len(df)):
    if df[1].dt.time.between(start, end): # <------- I'm confuse right here
        ShiftID[i]= 'PM'
    else:
        ShiftID[i]= 'AM'

df.insert(2, "ShiftID", ShiftID) # <--- insert 'working shift' column into index [2]

display(df)

我想要的 output 是这样的:

0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 2: {0: 'PM', 1: 'PM', 2: 'PM', 3: 'PM', 4: 'PM'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}}

你应该看看pandas 的 cut 方法

如上所述,没有原始数据,很难提出代码,但这样的事情应该接近您的解决方案(替换 <> 之间的值):

df["Working Shift"] = df.cut(df[1], bins=[<00:00>, <begin>, <end>, <23:59>], labels=["AM", "PM", "AM"])

尝试这个:

values = np.where(df[1].dt.time.between(start, end), 'PM', 'AM')
ShiftID = pd.Series(values, df.index)

或者

ShiftID = pd.Series('PM', df.index)
ShiftID.where(df[1].dt.time.between(start, end), 'AM')

更新

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict({0: {0: '2022-08-06',
  1: '2022-08-06',
  2: '2022-08-06',
  3: '2022-08-06',
  4: '2022-08-06'},
 1: {0: '07:35:16',
  1: '07:35:22',
  2: '07:35:29',
  3: '07:35:36',
  4: '07:35:42'},
 3: {0: 'OK', 1: 'OK', 2: 'OK', 3: 'OK', 4: 'OK'},
 4: {0: 1.524, 1: 1.628, 2: 1.364, 3: 1.164, 4: 1.494}})

am_start = pd.Timedelta(hours=7, minutes=50)
am_stop = pd.Timedelta(hours=19, minutes=49)
s = pd.to_timedelta(df[1])
df[2] = np.where(s.between(am_start, am_stop), 'AM', 'PM')
df.sort_index(axis=1, inplace=True)

这个也可以工作:

import numpy as np

condition1 = [(df[0].dt.time >= start) & (df[0].dt.time <= end)]
df['ShiftID'] = np.where(condition1, "PM","AM")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM