简体   繁体   English

聚类您的时间序列数据

[英]Cluster your time-series data

I have time-series data of 12 consumers. 我有12个消费者的时间序列数据。 The data corresponding to 12 consumers (named as a ... l ) is 对应于12个使用者(名为a ... l )的数据为 在此处输入图片说明

I want to cluster these consumers so that I may know which of the consumers have utmost similar consumption behavior. 我想对这些消费者进行聚类,以便我可以知道哪些消费者具有最相似的消费行为。 Accordingly, I found clustering method pamk , which automatically calculates the number of clusters in input data. 因此,我发现了聚类方法pamk ,它可以自动计算输入数据中的聚类数量。

I assume that I have only two options to calculate the distance between any two time-series, ie, Euclidean , and DTW . 我假设只有两个选项可以计算任意两个时间序列之间的距离,即EuclideanDTW I tried both of them and I do get different clusters. 我尝试了两个,但得到了不同的群集。 Now the question is which one should I rely upon? 现在的问题是我应该依靠哪一个? and why? 为什么呢?

When I use Eulidean distance I got following clusters: 当我使用Eulidean距离时,得到以下簇: 在此处输入图片说明

and using DTW distance I got 并使用DTW距离 在此处输入图片说明

Conclusion: How will you decide which clustering approach is the best in this case? 结论:在这种情况下,您将如何决定哪种聚类方法最好?

Note: I have asked the same question on Cross-Validated also. 注意:我在交叉验证中也曾问过同样的问题。

  1. none of the timeseries above look similar to me. 上面的时间序列看起来都不像我。 Do you see any pattern? 看到任何图案吗? Maybe there is no pattern? 也许没有模式?

  2. the clustering visualizations indicate that there are no clusters , too. 集群可视化表明也没有集群 b and l appear to be the most unusual outliers; bl似乎是最不寻常的异常值; followed by d,e,h ; 其次是d,e,h ; but there are no clusters there. 但那里没有集群。

  3. Also try hierarchical clustering. 还可以尝试分层聚类。 The dendrogram may be more understandable. 树状图可能更容易理解。

But in either way, there may be no clusters . 但是无论哪种方式,都可能没有集群 You need to be prepared for this outcome, and consider it a valid hypothesis. 您需要为此结果做好准备,并将其视为有效的假设。 Double-check any result . 仔细检查任何结果 As you have seen, pam will always return a result, and you have absolutely no means to decide which result is more "correct" than the other (most likely, neither is correct , and you should rely on neither, to answer your question). 如您所见,pam将始终返回结果,并且您绝对没有办法确定哪个结果比另一个结果更“正确”(最有可能的是, 两个都不正确 ,并且您都不应该依靠它来回答您的问题) 。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM