简体   繁体   中英

How to detect multiple plateaus and ascents and descent in the time-series data using python

Analysing time series data of bike trails, I would like to know the time interval for each plateau,ascent and descent.Sample csv file is uploaded here .

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import matplotlib.dates as mdates


df = pd.read_csv(r'C:\Data\Sample.csv', parse_dates=['dateTime'])
feature_used='Cycle_Alt'
print("Eliminating null values..")
df=df[df[feature_used].notnull()]

plt.figure(figsize=(8,6))
x=df['dateTime']        
y=df['Cycle_Alt']

plt.plot(x,y,c='b',linestyle=':',label="Altitude")
plt.xticks(rotation='vertical')
plt.gcf().autofmt_xdate()   
plt.legend(loc='best', bbox_to_anchor=(1, 0.5))

This plot provides me with a cross-profile like this. 在此处输入图像描述

What could be done to classify the time-series data to detect each plateau,ascent and descent, with the assumption that one may have more variables than presented in the sample.

在此处输入图像描述

If you are only interested in identify the plateaus, ascents, and descents in a series, the easy way is to use the numpy.diff function to calculate the n-th discrete difference. Then you can use the numpy.sign to convert the differences to either positive (ascents), zero (plateau), or negative (descents).

An example:

a = np.random.randint(1, 5, 10)
#array([1, 1, 1, 1, 3, 4, 2, 2, 2, 2])

diff = np.diff(a)
#array([ 0,  0,  0,  2,  1, -2,  0,  0,  0])

gradient = np.sign(diff)
#array([ 0,  0,  0,  1,  1, -1,  0,  0,  0])

Note that the final array gradient will have one fewer element than the original array, because the numpy.diff function will return (n-1) differences for an array of length n.

Not exactly what was asked but Google suggests this when searching for a plateau-finding algorithm so I'll leave this here for reference.

When just looking for plateaus, using the diff - cumsum combo to group the data can be very useful, especially when the the given values contain some amount of noise:

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

if __name__ == '__main__':
    # example data
    df = pd.DataFrame(
        {
            'time': np.arange(0, 8),
            'data': [1, 1.01, 2.0, 2.01, 2.5, 2.7, 3.1, 3.101]}
    )
    plt.plot(
        df['time'], df['data'], label=f"original data",
        marker='x', lw=0.5, ms=2.0, color="black",
    )

    # filter and group plateaus
    max_difference = 0.02
    min_number_points = 2
    # group by maximum difference
    group_ids = (abs(df['data'].diff(1)) > max_difference).cumsum()
    plateau_idx = 0
    for group_idx, group_data in df.groupby(group_ids):
        # filter non-plateaus by min number of points
        if len(group_data) < min_number_points:
            continue
        plateau_idx += 1
        plt.plot(
            group_data['time'], group_data['data'], label=f"Plateau-{plateau_idx}",
            marker='x', lw=1.5, ms=5.0,
        )
        _time = group_data['time'].mean()
        _value = group_data['data'].mean()
        plt.annotate(
            f"Plateau-{plateau_idx}", (_time, _value), ha="center",
        )
    plt.legend()
    plt.show()

高原分组

A plateau is defined as points that are a maximum of max_difference apart, and contain at least min_number_points .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM