简体   繁体   中英

How does pandas rolling_mean() work?

I need to use a moving average to smooth my data, so I have written a function using convolution. But the results are a left shift compared to my raw data. So I have used the built-in rolling_mean() from pandas and it works just fine. The problem is I don't want to use pandas and I'm trying to rewrite this function, but the source code does not explain how it works (or maybe it just me).

My original function was

def moving_average(data, window):
    return np.convolve(data, np.ones(window)/window, mode='valid')

Source code of pandas rolling_mean() is:

def f(arg, window, min_periods=None, freq=None, center=False, how=how,
      **kwargs):
    def call_cython(arg, window, minp, args=(), kwargs={}, **kwds):
        minp = check_minp(minp, window)
        return func(arg, window, minp, **kwds)
    return _rolling_moment(arg, window, call_cython, min_periods, freq=freq,
                           center=center, how=how, **kwargs)

The key is the argument "center", but I don't know how its works. 想像我的意思 Blue is raw data, green is my attempt, and red (correct) version is from pandas.

There isn't one correct way to smooth data, and even if you're using the mean there's still a lot of variation. Shifting is a very common result from simple rolling means though.

The bit of code you posted from pandas.rolling_mean doesn't show the operation; you can see where it specifies, for example, how=how that it's passing a parameter that isn't included in your snippet to determine what method it uses. It also references cython so I assume the guts of the command are written in C, not in Python (common because it's a lot faster).

I didn't go hunting for the underlying code because rolling_mean doesn't have much documentation and is deprecated to boot. Instead take a look at rolling from the latest version of Pandas, and it tells you what types of smoothers it can do. You might try passing those parameters into the rolling function and seeing which one does what you want; then you can look up the math behind it from a source of your choice to reproduce elsewhere.

I don't know the original poster's level of experience, but for anyone reading this who might not be well versed in signal processing or data smoothing, separating noise from trends is a huge area of research. Be very careful when you do it though, because the result is very sensitive to the method. For a few others, in addition to all the rolling functions Pandas offers, take a look at Holt-Winters, Baxter-King or Hodrick-Prescott. They all approach the problem differently, with very different results, strengths and weaknesses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM