简体   繁体   中英

Python: how to find the offset that minimizes the euclidean distance between two series?

I have two non-identical series where one is lagging the other. I want to find the x_axis offset that minimizes the Euclidean distance between the two series.

df = pd.DataFrame({'a':[1,4,5,10,9,3,2,6,8,4], 'b': [1,7,3,4,1,10,5,4,7,4]})

在此处输入图像描述

I am using Dynamic Time Warping modules in Python, which give me the minimum distance, but I am not sure how to get the offset.

from dtw import dtw,accelerated_dtw

d1 = df['a'].values
d2 = df['b'].values
d, cost_matrix, acc_cost_matrix, path = accelerated_dtw(d1,d2, dist='euclidean')

plt.imshow(acc_cost_matrix.T, origin='lower', cmap='gray', interpolation='nearest')
plt.plot(path[0], path[1], 'w')
plt.xlabel('a')
plt.ylabel('b')
plt.title(f'DTW Minimum Path with minimum distance: {np.round(d,2)}')
plt.show()

在此处输入图像描述

I am not sure how to interpret the "15" distance measure on the top of the cost matrix. Is it the minimum distance between the already-offseted series? or is it the offset that results in the minimum distance between the two series?

Thank you in advance!

It seems like you have a misunderstanding of how dynamic time warping (DTW) works. DTW tries to find the smallest cost matching of two timeseries (in your case euclidean distance). But the core feature of the algorithm is that the matching is NON-LINEAR , and thus the warping in the name. The two timeseries are warped, or twisted, to find the perfect fit. So DTW doesn't really provide you an optimal offset , since it is not about offsetting the whole timeseries by a fixed amount, but it rather operates on a point-by-point basis.

Look how the matching lines in the DTW are not linear (and some points match to more than one point):

在此处输入图像描述

As for the distance, it is the accumulated cost (or the total euclidean distance of the optimal DTW matching).

Another thing worth mentioning about DTW is that one of its default constraints is to match every single point in each timeseries. But in your case, you're trying to offset the entire graph, so some of the points won't be matched. There are, however, ways to relax this constraint and to impose another constraint on the matching (so as to only match once and to force the DTW to perform linear offsetting). But this needs a deep understanding of how the algorithm works and requires complicated configurations.

In short, I don't think DTW is the right choice of an algorithm in your case. You can try writing a script that checks the euclidean distance for different options of offsets (which shouldn't be a hard task since you're dealing with a fixed offset).

You can read more about DTW here: https://towardsdatascience.com/dynamic-time-warping-3933f25fcdd#:~:text=Dynamic%20Time%20Warping%20is%20used,time%20series%20with%20different%20length.&text=How%20to%20do%20that%3F,total%20distance%20of%20each%20component .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM