简体   繁体   English

在多维数组中寻找相似之处

[英]Finding similarities in a multidimensional array

Consider a sales department that sets a sales goal for each day. 考虑一个设置每天销售目标的销售部门。 The total goal isn't important, but the overage or underage is. 总体目标并不重要,但超额或未满是重要的。 For example, if Monday of week 1 has a goal of 50 and we sell 60, that day gets a score of +10. 例如,如果第1周的星期一的目标是50,而我们卖出60,则该天的得分为+10。 On Tuesday, our goal is 48 and we sell 46 for a score of -2. 在星期二,我们的目标是48,我们以46分卖出46分。 At the end of the week, we score the week like this: 在本周末,我们对本周进行如下评分:

[0,0]=10,[0,1]=-2,[0,2]=1,[0,3]=7,[0,4]=6

In this example, both Monday (0,0) and Thursday and Friday (0,3 and 0,4) are "hot" 在此示例中,星期一(0,0)以及星期四和星期五(0,3和0,4)均为“热门”

If we look at the results from week 2, we see: 如果我们查看第2周的结果,则会看到:

[1,0]=-4,[1,1]=2,[1,2]=-1,[1,3]=4,[1,4]=5

For week 2, the end of the week is hot, and Tuesday is warm. 对于第2周,一周的结束是热的,而星期二是温暖的。

Next, if we compare weeks one and two, we see that the end of the week tends to be better than the first part of the week. 接下来,如果我们比较第一周和第二周,就会发现一周的结束往往比一周的第一部分要好。 So, now let's add weeks 3 and 4: 因此,现在让我们添加第3和第4周:

[0,0]=10,[0,1]=-2,[0,2]=1,[0,3]=7,[0,4]=6
[1,0]=-4,[1,1]=2,[1,2]=-1,[1,3]=4,[1,4]=5
[2,0]=-8,[2,1]=-2,[2,2]=-1,[2,3]=2,[2,4]=3
[3,0]=2,[3,1]=3,[3,2]=4,[3,3]=7,[3,4]=9

From this, we see that the end of the week is better theory holds true. 从这一点上,我们可以看到,本周末的理论更好。 But we also see that end of the month is better than the start. 但是我们也看到月底比开始要好。 Of course, we would want to next compare this month with next month, or compare a group of months for quarterly or annual results. 当然,我们希望下个月将本月与下个月进行比较,或者将一组月的季度或年度结果进行比较。

I'm not a math or stats guy, but I'm pretty sure there are algorithms designed for this type of problem. 我不是数学或统计专家,但我很确定有针对此类问题的算法。 Since I don't have a math background (and don't remember any algebra from my earlier days), where would I look for help? 由于我没有数学背景(并且不记得以前的任何代数),因此在哪里可以寻求帮助? Does this type of "hotspot" logic have a name? 这种“热点”逻辑有名称吗? Are there formulas or algorithms that can slice and dice and compare multidimensional arrays? 是否存在可以对多维数组进行切片,切块和比较的公式或算法?

Any help, pointers or advice is appreciated! 任何帮助,指示或建议,不胜感激!

This data isn't really multidimensional, it's just a simple time series, and there are many ways to analyse it. 这些数据并不是真正的多维数据,它只是一个简单的时间序列,有很多分析方法。 I'd suggest you start with the Fourier Transform , it detects "rhythms" in a series, so this data would show a spike at 7 days, and also around thirty, and if you extended the data set to a few years it would show a one-year spike for seasons and holidays. 我建议您从傅立叶变换开始,它检测一系列的“节奏”,因此该数据将显示在7天的峰值,大约30天,如果将数据集扩展到几年,它将显示旺季和假期为一年的高峰。 That should keep you busy for a while, until you're ready to use real multidimensional data, say by adding in weather information, stock market data, results of recent sports events and so on. 在您准备使用真实的多维数据之前,这应该让您保持一段时间,例如通过添加天气信息,股市数据,最近的体育赛事的结果等等。

The following might be relevant to you: Stochastic oscillators in technical analysis, which are used to determine whether a stock has been overbought or oversold. 以下内容可能与您相关:技术分析中的随机震荡指标 ,用于确定库存是否超买或超卖。

I'm oversimplifying here, but essentially you have two moving calculations: 我在这里简化了,但是实际上您有两个移动的计算:

  • 14-day stochastic: 100 * (today's closing price - low of last 14 days) / (high of last 14 days - low of last 14 days) 14天随机指标:100 *(今天的收盘价-最近14天的低点)/(最近14天的高点-最近14天的低点)
  • 3-day stochastic: same calculation, but relative to 3 days. 3天随机:相同的计算,但相对于3天。

The 14-day and 3-day stochastics will have a tendency to follow the same curve. 14天和3天随机数将倾向于遵循相同的曲线。 Your stochastics will fall somewhere between 1.0 and 0.0; 您的随机性将落在1.0到0.0之间; stochastics above 0.8 are considered overbought or bearish, below 0.2 indicates oversold or bullish. 高于0.8的随机指标被认为是超买或看跌,低于0.2则表示超卖或看涨。 More specifically, when your 3-day stochastic "crosses" the 14-day stochastic in one of those regions, you have predictor of momentum of the prices. 更具体地说,当您的其中一个区域的3天随机数“穿越” 14天随机数时,您便可以预测价格的动量。

Although some people consider technical analysis to be voodoo, empirical evidence indicates that it has some predictive power. 尽管有人认为技术分析是伏都教,但经验证据表明它具有一定的预测能力。 For what its worth, a stochastic is a very easy and efficient way to visualize the momentum of prices over time. 随机值是一种可视化价格随时间变化的非常简单有效的方法。

What you want to do is quite simple - you just have to calculate the autocorrelation of your data and look at the correlogram . 您要做的很简单-您只需要计算数据的自相关并查看相关图即可 From the correlogram you can see 'hidden' periods of your data and then you can use this information to analyze the periods. 从相关图可以看到数据的“隐藏”周期,然后可以使用此信息来分析周期。

Here is the result - your numbers and their normalized autocorrelation. 结果是-您的数字及其标准化的自相关。

10    1,000
-2    0,097
 1   -0,121
 7    0,084
 6    0,098
-4    0,154
 2   -0,082
-1   -0,550
 4   -0,341
 5   -0,027
-8   -0,165
-2   -0,212
-1   -0,555
 2   -0,426
 3   -0,279
 2    0,195
 3    0,000
 4   -0,795
 7   -1,000
 9

I used Excel to get the values. 我使用Excel来获取值。 But the sequence in column A and add the equation =CORREL($A$1:$A$20;$A1:$A20) to cell B1 and copy it then up to B19 . 但是将A列中的序列添加等式=CORREL($A$1:$A$20;$A1:$A20)到单元格B1 ,然后将其复制到B19 If you the add a line diagram, you can nicely see the structure of the data. 如果添加折线图,则可以很好地看到数据的结构。

在我看来, OLAP方法(如MS Excel中的数据透视表)非常适合该问题。

You can already make reasonable guesses about the periods of patterns - you're looking at things like weekly and monthly. 您已经可以对模式周期做出合理的猜测-您正在查看每周和每月的情况。 To look for weekly patterns, for example, just average all the mondays together and so on. 例如,要查找每周模式,只需将所有星期一的平均值平均,依此类推。 Same goes for days of the month, for months of the year. 一个月的某几天,一年的几个月都一样。

Sure, you could use a complex algorithm to find out that there's a weekly pattern, but you already know to expect that. 当然,您可以使用复杂的算法来确定每周有一个模式,但是您已经知道会期望这样做。 If you think there really may be patterns buried there that you'd never suspect (there's a strange community of people who use a 5-day week and frequent your business), by all means, use a strong tool -- but if you know what kinds of things to look for, there's really no need. 如果您认为确实存在一些您永远不会怀疑的模式(有一个奇怪的社区,他们每周工作5天,并且经常出差),请务必使用强大的工具-但是如果您知道寻找什么样的东西,真的没有必要。

Daniel has the right idea when he suggested correlation but I don't think autocorrelation is what you want. Daniel建议关联时,他的想法是正确的,但我不认为自相关就是您想要的。 Instead I would suggest correlating each week with each other week. 相反,我建议将每个星期彼此关联。 Peaks in your correlation--that is values close to 1--suggest that the values of the weeks resemble each other (Ie are peiodic) for that particular shift. 您的相关性峰值(即接近1的值)建议,在该特定班次中,周的值彼此相似(即为周期)。

For example when you cross correlate 例如,当您相互关联时

0 0 1 2 0 0

with

0 0 0 1 1 0

the result would be 结果将是

 2 0 0 1 3 0

the highest value is 3, which corresponds to shifting (right) the second array by 4 最大值是3,对应于第二个数组(右)移动4

0 0 0 1 1 0 -->  0 0 1 1 0 0

and thenn multiplying component wise 和nn乘数明智

0   0   1   2   0   0
0   0   1   1   0   0
----------------------
0 + 0 + 1 + 2 + 0 + 0 = 3

Note that when you correlate you can create your own "fake" week and cross-correlate all your real weeks, the idea being that you are looking for "shapes" of your weekly values that correspond to the shape of your fake week by looking for peaks in the correlation result. 请注意,建立关联后,您可以创建自己的“假”周并将所有真实周进行互相关,即通过寻找以下值来查找与假周形状相对应的每周值的“形状”:相关结果中出现峰值。

So if you are interested in finding weeks that are close near the end of the week you could use the "fake" week 因此,如果您有兴趣查找接近周末的几周,则可以使用“假”周

 -1 -1 -1 -1  1  1

and if you get a high response in the first value of the correlation this means that the real week that you correlated with has roughly this shape. 并且如果您在相关的第一个值中获得了很高的响应,则意味着您与之相关的真实周大致具有这种形状。

ARIMA或类似的Box-Jenkins模型可能会超出您正在寻找的范围,但是一种使您能够进行预测,查看诸如统计显着性之类的功能的技术方法将是ARIMA或类似的Box-Jenkins模型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM