[英]How to avoid iterrows for this pandas dataframe
我需要您的建议来改进以下代码。 这样做的主要目的是根据MachineState列计算启动机器的时间。 实际代码需要大约 16-18 分钟来迭代一个大约 100 000 行 × 700 列的数据帧,这对我来说太长了。
ShutDownMask = df['MachineState'] == 'Shut Down'
ShutDownPos = np.flatnonzero(ShutDownMask)
# Create mask with Starting state, get index numbers/positions
StartingMask = df['MachineState'] == 'Starting'
# Index list
StartingPos = np.flatnonzero(StartingMask)
for index, row in df.iterrows():
if row['MachineState'] == 'Shut Down':
start = pd.to_datetime(row['Date'])
try:
idx = df.iloc[StartingPos].index[df.iloc[StartingPos].index.get_loc(start, method='backfill')]
df.loc[index,'TimeToStart'] = idx - start
except:
print ('Something went wrong to find last index IDX') #For last set of record
pass
我已经尝试了np.where
的推荐选项,但没有成功,因为我不知道如何将计数器实现到下一个索引。
df['TimeToStart'] = np.where(df['Machinetate'] == 'Shut Down',df.iloc[StartingPos].index[df.iloc[StartingPos].index.get_loc(pd.to_datetime(df['Date']), method='backfill')],pd.na)
数据框如下所示:
日期 | 机器状态 | 开始时间 |
---|---|---|
10/02/2021 10:30:00 | 关掉 | 0 天 00:30:00 |
10/02/2021 10:40:00 | 关掉 | 0 天 00:20:00 |
10/02/2021 10:50:00 | 关掉 | 0 天 00:10:00 |
10/02/2021 11:00:00 | 开始 |
您可以尝试以下操作( df
您的数据框):
def time_to_start(sdf):
if sdf.MachineState.iat[-1] == "Starting":
sdf["TimeToStart"] = sdf.Date.iat[-1] - sdf.Date
else:
sdf["TimeToStart"] = pd.NaT
return sdf
df.Date = pd.to_datetime(df.Date) # Just to make sure
df = (
df.groupby(
df.MachineState.shift().eq("Starting").cumsum(),
as_index=False,
sort=False
)
.apply(time_to_start)
)
结果为
Date MachineState
0 10/02/2021 10:30:00 Shut Down
1 10/02/2021 10:40:00 Shut Down
2 10/02/2021 10:50:00 Shut Down
3 10/02/2021 11:00:00 Starting
4 10/02/2021 11:10:00 Shut Down
5 10/02/2021 11:30:00 Shut Down
6 10/02/2021 12:00:00 Starting
7 10/02/2021 12:40:00 Shut Down
是
Date MachineState TimeToStart
0 2021-10-02 10:30:00 Shut Down 0 days 00:30:00
1 2021-10-02 10:40:00 Shut Down 0 days 00:20:00
2 2021-10-02 10:50:00 Shut Down 0 days 00:10:00
3 2021-10-02 11:00:00 Starting 0 days 00:00:00
4 2021-10-02 11:10:00 Shut Down 0 days 00:50:00
5 2021-10-02 11:30:00 Shut Down 0 days 00:30:00
6 2021-10-02 12:00:00 Starting 0 days 00:00:00
7 2021-10-02 12:40:00 Shut Down NaT
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.