简体   繁体   English

比较和量化一组非线性数据中的相似性

[英]Comparing and Quantifying similarity in a set of non linear data

I have 2 lists of data which are basically a batch of SEQUENTIAL data(thus, data cannot be sorted) from a larger database which are as follows我有 2 个数据列表,它们基本上是来自较大数据库的一批 SEQUENTIAL 数据(因此,无法对数据进行排序),如下所示

a = [0.8, 0.9, 0.4, -0.4, 1.12, 1.16, 1.08, 1.22]
b = [0.85, 0.96, 0.4, -0.4, 1.15, 1.18, 1.1, 1.92]

The data provided may not be linear in nature and thus typical correlation wont serve the purpose.提供的数据本质上可能不是线性的,因此典型的相关性不会达到目的。

I wish to compare a and b (as a line graph) and assign a similarity score to them.我希望比较 a 和 b(作为折线图)并为它们分配一个相似度分数。

Ive tried implementing linear co-relation from the stats library but the results are not convincing.我试过从 stats 库中实现线性相关,但结果并不令人信服。

Any way to do this using any other statistical function, which emphasizes on the importance of non linear data?有什么方法可以使用任何其他强调非线性数据重要性的统计函数来做到这一点?

Also, is any supporting function available in scikit learn?另外,scikit learn 中是否有任何支持功能?

There is not one clear cut way on how to compare time series.关于如何比较时间序列,没有一种明确的方法。 I'd say, you need to think about what kind of information are not important to you and then pick an algorithm that omits this information and focuses on that information that is important to you.我想说的是,您需要考虑哪些信息对您不重要,然后选择一种算法,忽略这些信息并专注于对您重要的信息。 There two main distinctions:有两个主要区别:

a) Direct comparison: Compare the data directly, this can eg be just the norm of difference of the entries, so ||ab|| a) 直接比较:直接比较数据,例如这可以只是条目差异的范数,所以 ||ab|| or some algorithm like dynamic time warping, or a correlation score.或一些算法,如动态时间扭曲,或相关分数。

b) Model the time series and compare the models: Eg do linear regression on both of them and compare how different the parameters are. b) 对时间序列建模并比较模型:例如,对它们进行线性回归并比较参数的不同程度。

It depends on what is important to you, what the data represents etc. So, maybe elaborate more on why the results you have right now are not convincing and what do you mean by "as a line graph"?这取决于什么对您很重要,数据代表什么等等。所以,也许可以详细说明为什么您现在的结果没有说服力,以及您所说的“作为折线图”是什么意思? Also, maybe a statistic/math forum is better suited for this question?另外,也许统计/数学论坛更适合这个问题?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM