简体   繁体   中英

Comparing and Quantifying similarity in a set of non linear data

I have 2 lists of data which are basically a batch of SEQUENTIAL data(thus, data cannot be sorted) from a larger database which are as follows

a = [0.8, 0.9, 0.4, -0.4, 1.12, 1.16, 1.08, 1.22]
b = [0.85, 0.96, 0.4, -0.4, 1.15, 1.18, 1.1, 1.92]

The data provided may not be linear in nature and thus typical correlation wont serve the purpose.

I wish to compare a and b (as a line graph) and assign a similarity score to them.

Ive tried implementing linear co-relation from the stats library but the results are not convincing.

Any way to do this using any other statistical function, which emphasizes on the importance of non linear data?

Also, is any supporting function available in scikit learn?

There is not one clear cut way on how to compare time series. I'd say, you need to think about what kind of information are not important to you and then pick an algorithm that omits this information and focuses on that information that is important to you. There two main distinctions:

a) Direct comparison: Compare the data directly, this can eg be just the norm of difference of the entries, so ||ab|| or some algorithm like dynamic time warping, or a correlation score.

b) Model the time series and compare the models: Eg do linear regression on both of them and compare how different the parameters are.

It depends on what is important to you, what the data represents etc. So, maybe elaborate more on why the results you have right now are not convincing and what do you mean by "as a line graph"? Also, maybe a statistic/math forum is better suited for this question?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM