简体繁体 English

简单趋势分析算法

[英]Simple trend analysis algorithm

原文 2013-09-06 21:05:14 3 2 algorithm/ statistics/ data-analysis/ trend

OK, so you have some historic data in the form of [say] an array of integers. 好的，所以您有一些历史数据，它们的形式为[说]整数数组。 This, for example, could represent free-space on a server HDD over a two-year period, with each array element representing a daily sample. 例如，这可以表示两年内服务器硬盘上的自由空间，每个阵列元素代表每日样本。

The data (free-space in this example) has a downward trend, but also has periodic positive spikes where files have been removed/compressed, Etc. 数据（在此示例中为自由空间）呈下降趋势，但在文件已被删除/压缩等情况下也具有周期性的正峰值。

How would you go about identifying the overall trend for the two-year period, ie: iron out the peaks and troughs in the data? 您将如何确定两年期间的总体趋势，即：消除数据中的高峰和低谷？

Now, I did A-level statistics and then a stats module in my degree, but I've slept over 7,000 times since then, and well, it's leaked out of my brain. 现在，我完成了A级统计，然后获得了我所学历的统计模块，但是从那以后我睡了7,000次以上，好吧，它已经从我的大脑中泄漏出来了。

I'm not after a bit of code as such, more of a description of how you'd approach this problem... 我并不需要那么多代码，更多地是关于如何解决这个问题的描述...

Thanks in advance! 提前致谢！

2 个解决方案

You'll get many different answers, and the one you choose really depends on more specific requirements you may have. 您会得到许多不同的答案，而您选择的答案实际上取决于您可能有的更具体的要求。 Examples: 例子：

Low-pass filter, or any other spectral analysis technique, and use the low frequencies to determine trend. 低通滤波器或任何其他频谱分析技术，并使用低频确定趋势。
Linear regression (time/value) to find "r" (the correlation between time and the value). 线性回归（时间/值）以找到“ r”（时间与值之间的相关性）。
Moving average of last "n" samples. 最后“ n”个样本的移动平均值。 If "n" is large enough this is my favorite as many times this is sufficient, and is very easy to code. 如果“ n”足够大，这是我的最爱，因为很多时候这已经足够了，并且很容易编写代码。 It's a sort of approximation to #1 above. 这是上面＃1的近似值。

I'm sure they'll be others. 我敢肯定他们会是别人。

If I was doing this to produce a line through points for me to look at, I would probably use a some variant of Loess, described at http://en.wikipedia.org/wiki/Local_regression , http://stat.ethz.ch/R-manual and /R-patched/library/stats/html/loess.html. 如果我这样做是为了产生通过点线给我看，我可能会使用黄土的一些变型中，描述http://en.wikipedia.org/wiki/Local_regression ， HTTP：//stat.ethz .ch / R-manual和/R-patched/library/stats/html/loess.html。 Basically, you find the smoothed value at any particular point by doing a weighted regression on the data points near that point, with the nearest points given the most weight. 基本上，您可以通过对该点附近的数据点进行加权回归来找到任何特定点的平滑值，其中最接近的点的权重最大。