wasserstein_distance function requires that the input data are " Values observed in the (empirical) distribution ".
My data arrays range between -4 and 8:
x = np.array([0.12,-1.29,-3.23,-3.21,-0.13, 1.52, 4.45, 6.45, 5.17, 0.11, 3.48, 5.98, 7.55])
y = np.array([3.54, 2.42,-4.43,-3.76, 0.43, 0.45, 2.56, 7.61, 4.47, 1.36, 2.34, 7.78, 7.13])
how can I create an empirical distribution of x
and y
?
I tried
from statsmodels.distributions.empirical_distribution import ECDF
ecdf_x = ECDF(x)
x_ecdf = ecdf_y.y
ecdf_y = ECDF(y)
y_ecdf = ecdf_y.y
wasserstein_distance(x_ecdf, y_ecdf)
Would x_ecdf
and y_ecdf
be valid inputs to the function?
I think you do not need to convert your x,y to ECDF
import scipy
import scipy.stats
import numpy as np
x = np.array([0.12,-1.29,-3.23,-3.21,-0.13, 1.52, 4.45, 6.45, 5.17, 0.11, 3.48, 5.98, 7.55])
y = np.array([3.54, 2.42,-4.43,-3.76, 0.43, 0.45, 2.56, 7.61, 4.47, 1.36, 2.34, 7.78, 7.13])
scipy.stats.wasserstein_distance(x,y)
1.0376923076923077
scipy.stats.wasserstein_distance(u_values, v_values, u_weights=None, v_weights=None)
Parameters: u_values, v_values array_like
Examples from site:
from scipy.stats import wasserstein_distance
wasserstein_distance([0, 1, 3], [5, 6, 8])
#5.0
wasserstein_distance([0, 1], [0, 1], [3, 1], [2, 2])
#0.25
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.