根据列的第一个值递增地添加时间到 dataframe 中的列

Question

我有一个场景，我必须增加 dataframe 中列的时间戳。

dataframe 由一个具有一组相同区域 ID 的列和一个“waterDuration”列组成。

我想将此持续时间连续添加到每种区域类型的第一行中给出的时间戳中，并逐步更新每个区域 ID 的行的 rest。

这就是我的 dataframe 的样子。

给定每个 areaId 的第一个时间戳，我想将其旁边给出的任何持续时间添加到初始值，并为 rest 更新和递增，例如：-

这些是我的 dataframe 的所有列：-

scheduleId          int64
scheduleName       object
areaId             object
deviceId           object
stationDeviceId    object
evStatus           object
waterDuration      object
noOfCyles          object
startTime1         object
startTime2         object
startTime3         object
startTime4         object
waterPlanning      object
lastUpdatedTime    object
dtype: object

我希望 df 中的所有这些列及其值以及 startTime1 中的更新值保持不变。

waterDuration 的值可以改变，所以我不想直接在解决方案中使用它。 任何帮助都会很棒

Answer 1

所以这是你的 dataframe：

data = {
    "areaID": [125659657, 125659657, 125659657, 125659657, 9876913, 9876913, 9876913, 9876913],
    "waterDuration": [15, 15, 15, 15, 10, 10, 10, 10],
    "startTime1": ["00:04:00", "00:00:00", "00:00:00", "00:00:00", "00:34:00", "00:00:00", "00:00:00", "00:00:00"]
}

df = pd.DataFrame(data)

为了获得您想要的 output，创建一个 function 以应用于 dataframe：

def add_from_last_row(row):
    # If first row, nothing to do
    # row.name corresponds to the DataFrame index
    if row.name == 0:
        return row.startTime1
    # If prev. row is not the same area, do nothing
    if row.areaID != df.loc[row.name-1, 'areaID']:
        return row.startTime1

    # Get the min index in order to get the original startTime
    min_index = df[df.areaID == row.areaID].index.min()
    # Here we get the original startTime, cast to datetime
    default_time = pd.to_datetime(df.loc[min_index, 'startTime1'], format="%H:%M:%S")
    # Sum all durations from min_index+1 to current row index
    seconds_to_add = df.loc[min_index+1:row.name, 'waterDuration'].sum()
    # Calculate the offset in seconds
    offset = pd.DateOffset(seconds=int(seconds_to_add))

    # return the last 8 character ie. hh:mm:ss
    # otherwise it would be YYYY-MM-DD hh:mm:ss
    return str(default_time + offset)[-8:]

然后应用它：

df.apply(lambda x: add_from_last_row(x), axis=1)

结果：

0    00:04:00
1    00:04:15
2    00:04:30
3    00:04:45
4    00:34:00
5    00:34:10
6    00:34:20
7    00:34:30
dtype: object

希望能帮助到你

根据列的第一个值递增地添加时间到 dataframe 中的列

问题描述

1 个解决方案

解决方案1
0 2022-07-27 16:29:43

根据列的第一个值递增地添加时间到 dataframe 中的列

问题描述

1 个解决方案

解决方案1 0 2022-07-27 16:29:43

解决方案1
0 2022-07-27 16:29:43