繁体   English   中英

如何避免此熊猫数据框的迭代

[英]How to avoid iterrows for this pandas dataframe

我需要您的建议来改进以下代码。 这样做的主要目的是根据MachineState列计算启动机器的时间。 实际代码需要大约 16-18 分钟来迭代一个大约 100 000 行 × 700 列的数据帧,这对我来说太长了。

ShutDownMask = df['MachineState'] == 'Shut Down'
ShutDownPos = np.flatnonzero(ShutDownMask)
# Create mask with Starting state, get index numbers/positions 
StartingMask = df['MachineState'] == 'Starting'
# Index list 
StartingPos = np.flatnonzero(StartingMask)
for index, row in df.iterrows():
    if row['MachineState'] == 'Shut Down':
        start = pd.to_datetime(row['Date'])
        try:
            idx = df.iloc[StartingPos].index[df.iloc[StartingPos].index.get_loc(start, method='backfill')]
            df.loc[index,'TimeToStart'] = idx - start
         except:
            print ('Something went wrong to find last index IDX') #For last set of record 
            pass

我已经尝试了np.where的推荐选项,但没有成功,因为我不知道如何将计数器实现到下一个索引。

df['TimeToStart'] = np.where(df['Machinetate'] == 'Shut Down',df.iloc[StartingPos].index[df.iloc[StartingPos].index.get_loc(pd.to_datetime(df['Date']), method='backfill')],pd.na)

数据框如下所示:

日期 机器状态 开始时间
10/02/2021 10:30:00 关掉 0 天 00:30:00
10/02/2021 10:40:00 关掉 0 天 00:20:00
10/02/2021 10:50:00 关掉 0 天 00:10:00
10/02/2021 11:00:00 开始

您可以尝试以下操作( df您的数据框):

def time_to_start(sdf):
    if sdf.MachineState.iat[-1] == "Starting":
        sdf["TimeToStart"] = sdf.Date.iat[-1] - sdf.Date
    else:
        sdf["TimeToStart"] = pd.NaT
    return sdf

df.Date = pd.to_datetime(df.Date)  # Just to make sure
df = (
    df.groupby(
        df.MachineState.shift().eq("Starting").cumsum(),
        as_index=False,
        sort=False
      )
      .apply(time_to_start)
)

结果为

                  Date MachineState
0  10/02/2021 10:30:00    Shut Down
1  10/02/2021 10:40:00    Shut Down
2  10/02/2021 10:50:00    Shut Down
3  10/02/2021 11:00:00     Starting
4  10/02/2021 11:10:00    Shut Down
5  10/02/2021 11:30:00    Shut Down
6  10/02/2021 12:00:00     Starting
7  10/02/2021 12:40:00    Shut Down

                 Date MachineState      TimeToStart
0 2021-10-02 10:30:00    Shut Down  0 days 00:30:00
1 2021-10-02 10:40:00    Shut Down  0 days 00:20:00
2 2021-10-02 10:50:00    Shut Down  0 days 00:10:00
3 2021-10-02 11:00:00     Starting  0 days 00:00:00
4 2021-10-02 11:10:00    Shut Down  0 days 00:50:00
5 2021-10-02 11:30:00    Shut Down  0 days 00:30:00
6 2021-10-02 12:00:00     Starting  0 days 00:00:00
7 2021-10-02 12:40:00    Shut Down              NaT

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM