简体   繁体   中英

Pandas Dataframe | Check if value is set for X minutes | Non-linear-timestamp-index

I have problem with my dataset.

Let's presume my dataset looks like this

timestamp           | zone 
2022-06-01 05:00:06 | yellow
2022-06-01 05:01:07 | yellow
2022-06-01 05:02:10 | yellow
2022-06-01 05:03:05 | yellow
2022-06-01 05:07:04 | yellow
2022-06-01 05:10:05 | orange
2022-06-01 05:11:05 | orange
2022-06-01 05:12:05 | orange
2022-06-01 05:16:04 | orange
2022-06-01 05:17:04 | orange

timestamp column is the index The yellow and orange zones represent a calculated zone.

Condition: A zone change can only happen if the previous zone has been set for at least X minutes (let's presume its 15 minutes for this example)

Excepted result:

timestamp           | zone 
2022-06-01 05:00:06 | yellow
2022-06-01 05:01:07 | yellow
2022-06-01 05:02:10 | yellow
2022-06-01 05:03:05 | yellow
2022-06-01 05:07:04 | yellow
2022-06-01 05:10:05 | yellow
2022-06-01 05:11:05 | yellow
2022-06-01 05:12:05 | yellow
2022-06-01 05:16:04 | yellow
2022-06-01 05:17:04 | orange

Because the yellow zone was set from 05:00:06 . This means that for at least 15 minutes the yellow zone must be set, without taking into account the previously performed zone calculation. This means that the yellow zone must be set until 05:16:04 . From then on, the zone can be set orange.

I believe there are two ways to do this. One is to check the elapsed time while the zone is being calculated, the other is to change the zone after it has been calculated. My priority is performance as I plan to use this data in a dashboard. The calculation of the zone is done with the np.select method

Just imagine there are values that are being compared to thresholds

conditions = [
    (df.value <= df.threshold_red), # red
    (df.value> df.threshold_red) & (df.value<= df.threshold_orange), # orange
    (df.value> df.threshold_orange) & (df.value<= df.threshold_green), # yellow
    (df.value> df.threshold_green), # green
]
zones = ["red" ,"orange", "yellow", "green"]

df["zone"] = np.select(conditions, zones)

How am I able to this? I have used .apply() with lambda , but I am not able to get the result...

Thanks for your help in advance:)

Ocamond

You can do this with a loop:

result = []
ts = pd.Timestamp(0)
assigned_zone = None

for timestamp, zone in df["zone"].items():
    if timestamp - ts > pd.Timedelta(minutes=15) and assigned_zone != zone:
        ts = timestamp
        assigned_zone = zone
    result.append(assigned_zone)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM