简体   繁体   English

如何在不知道 window 频率的 Python Pandas 系列中找到所有局部最大值和最小值

[英]How to find all local maxima and minima in a Python Pandas series without knowing the frequency of the window

As background to my question, please allow me to explain the problem I am trying to solve.作为我问题的背景,请允许我解释我要解决的问题。 I have a sensor that is collecting pressure data.我有一个传感器正在收集压力数据。 I am collecting this data into a pandas dataframe, structured like this:我将这些数据收集到 pandas dataframe 中,结构如下:

DateTime                             Transmission Line PSI                
2021-02-18 11:55:34                  3.760
2021-02-18 11:55:49                  3.359
2021-02-18 11:56:04                  3.142
2021-02-18 11:56:19                  3.009
2021-02-18 11:56:34                  2.938
...                                    ...
2021-02-19 12:05:06                  3.013
2021-02-19 12:05:21                  3.011
2021-02-19 12:05:36                  3.009
2021-02-19 12:05:51                  3.009
2021-02-19 12:06:06                  3.007

I can plot the dataframe with pyplot and see visually when the compressor that feeds the system is running, how often, and how long it takes to pressurize the system.我可以使用 pyplot plot dataframe 并直观地查看为系统供电的压缩机何时运行、多久运行一次以及对系统加压需要多长时间。 Plot of pressure data: Plot的压力数据:
在此处输入图像描述

As is evident from the image, the cycles on the left side of the plot are radically shorter than those on the right.从图中可以明显看出,plot 左侧的周期比右侧的周期短得多。

The problem I am trying to solve is I want to programmatically calculate the max pressure, min pressure, period length, and duty cycle of the last complete on-off cycle.我要解决的问题是我想以编程方式计算最后一个完整开关周期的最大压力、最小压力、周期长度和占空比。 A bonus would be to programmatically calculate the total run time for a 24-hour period.一个好处是以编程方式计算 24 小时期间的总运行时间。

I figured that I would need to take the derivative of the pressure series, and I am using the solution found at python pandas: how to calculate derivative/gradient .我想我需要对压力序列求导,并且我正在使用在python pandas: how to calculate derived/gradient找到的解决方案。

Plot of the derivative series:衍生系列的Plot:
在此处输入图像描述

The derivative series will then show numerically when the compressor is running (positive numbers) and not (zero or negative numbers).然后,当压缩机运行(正数)而不是(零或负数)时,导数系列将以数字形式显示。 I was thinking that I could then find all of the maxima and minima of the individual peaks and from there get the timedeltas between them.我在想,然后我可以找到各个峰的所有最大值和最小值,并从那里得到它们之间的时间增量。

However, the problem I'm running into is any solutions I've found so far require me to know in advance how large a window to use (for example, the order argument when using SciPy argrelextrema https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.argrelextrema.html ). However, the problem I'm running into is any solutions I've found so far require me to know in advance how large a window to use (for example, the order argument when using SciPy argrelextrema https://docs.scipy.org /doc/scipy/reference/generated/scipy.signal.argrelextrema.html )。

But my data series features cycles as short as minutes, and ideally (if we didn't have leaks.) cycles should stretch into hours or longer, Using short windows will cause me to have false maxima and minima in longer cycles.但是我的数据系列的周期短至几分钟,理想情况下(如果我们没有泄漏。)周期应该延长到几小时或更长时间,使用短 windows 会导致我在较长的周期中出现错误的最大值和最小值。 and longer windows will cause me to miss many maxima and minima on the shorter ones.更长的 windows 会导致我错过较短的许多最大值和最小值。

Any ideas for seeing programmatically what is plain to the eye in the above plot?以编程方式查看上述 plot 中显而易见的内容的任何想法?

Mr.T's comment above had my answer... using scipy.signal.find_peaks allowed me to do what I needed. Mr.T 上面的评论有我的答案......使用scipy.signal.find_peaks让我可以做我需要的事情。 Posting the code below.在下面发布代码。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import scipy.signal as sig

namespace = ['DateTime', 'Transmission Line PSI']
    
plt.rcParams["figure.figsize"] = [16.0, 9.0]
fig, ax = plt.subplots()
df = pd.read_csv(r'\\192.168.1.1\raid\graphdata.csv', names=namespace)

# convert imported date/time information to real datetimes and set as index
df['DateTime'] = pd.to_datetime(df['DateTime'])
df = df.set_index(df['DateTime']).drop('DateTime', axis=1)

# take first derivative of pressure data to show when pressure is rising or falling
df['deltas'] = df['Transmission Line PSI'].diff() / df.index.to_series().diff().dt.total_seconds()
df['deltas'] = df['deltas'].fillna(0)

peaks, _ = sig.find_peaks(df['deltas'], height=0.01)
neg_peaks, _ = sig.find_peaks(-df['deltas'], height=0.01)

# plotting peaks and neg_peaks against first derivative
plt.scatter(df.iloc[peaks].index, df.iloc[peaks]['deltas'])
plt.scatter(df.iloc[neg_peaks].index, df.iloc[neg_peaks]['deltas'])
plt.plot(df['deltas'])
plt.show()

# find timedeltas between all positive peaks - these are the periods of the cycle times
cycle_times = df.iloc[peaks].index.to_series().diff().dt.seconds.div(60, fill_value=0)

# plot periods
plt.plot(cycle_times)
plt.show()

Resulting plot of peaks against first derivative:结果 plot 峰对一阶导数: 峰值对一阶导数的图

Sample of cycle_times: cycle_times 样本:

>>> cycle_times
DateTime
2021-02-18 11:59:04     0.000000
2021-02-18 12:04:04     5.000000
2021-02-18 12:09:35     5.516667
2021-02-18 12:16:05     6.500000
2021-02-18 12:21:35     5.500000
                         ...    
2021-02-19 08:54:09    17.016667
2021-02-19 09:27:56    33.783333
2021-02-19 10:15:44    47.800000
2021-02-19 11:24:19    68.583333
2021-02-19 12:02:36    38.283333
Name: DateTime, Length: 267, dtype: float64

Plot of cycle times: Plot 的循环时间: 循环时间图

https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM