简体   繁体   中英

Pandas rolling window function offsets data

I am attempting to use the Pandas rolling_window function, with win_type = 'gaussian' or win_type = 'general_gaussian'. I have a time-series dataset, indexed by datetime, and I need a smoothing function to reduce noise. I would like to avoid the boxcar, and instead use a Gaussian weighting. I have experimented with many ranges of window size and std (for gaussian), and window size, power, and width values (for general Gaussian), and I consistently get the same result: the smoothed output is offset lower than the original input data. This is the same issue that was asked but remains unanswered here .

The specific line of code I am trying to use for this is:

dNorth_smooth = rolling_window(s, window=40, win_type='gaussian', std=30, center=True, freq='15S')

Where 's' is a single column of data in a datetime-indexed Pandas dataframe. In this case, 's' is position in meters, at 15 second time intervals. So, my window size is 40 lines, or 40*15 = 600 sec = 10 min. It is not clear what exactly the std argument refers to, but I assume this is in the frequency domain, and would be some value smaller than the window size, controlling the shape of the Gaussian curve (regardless, I have experimented with many std values; if std is very large, then no offset occurs, but this is because the Gaussian curve becomes so wide compared to the window, that you are essentially using a boxcar). The 'center' and 'freq' arguments do not appear to change output either way. Other optional arguments also seem irrelevant.

Once I have higher reputation I can post a plot to help explain. But see the plot at the linked question above, as this is the exact same problem I have. Also to note: the boxcar window (which is equivalent to rolling_mean) does not have this offset problem. It does, however, seem to exist with all other window weighting functions (triang, blackman, etc).

As there has been no specific Pandas solution posted for this question (or the similar linked question), I am posting a solution using standard numpy and scipy functions. This will produce a smoothed curve using gaussian weighting, and works for any magnitude data (does not have offset issues).

def smooth_gaussian(data,window,std):
  g = sp.signal.gaussian(window,std,sym=True)
  con = np.convolve(g/g.sum(),data,mode='valid')
  con_shift = np.r_[np.full((window*0.5),np.nan),con,np.full((window*0.5),np.nan)]
  return con_shift   

The resulting dataset is shorter than the input dataset by the length of the window, as the first and last "smoothed" data points occur at (window * 0.5) from either end. The returned variable con_shift accounts for this, and centers the smoothed data with respect to the input data, so that they are the same length and can be plotted together. window argument is the size of the moving window, and std is the standard deviation, controlling the shape of the Gaussian curve (I set mine to 0.1 * window). Note that for con_shift to be symmetric, window size must be an odd integer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM