简体   繁体   中英

empirical distribution from data - python

wasserstein_distance function requires that the input data are " Values observed in the (empirical) distribution ".

My data arrays range between -4 and 8:

x = np.array([0.12,-1.29,-3.23,-3.21,-0.13, 1.52, 4.45, 6.45, 5.17, 0.11, 3.48, 5.98, 7.55])
y = np.array([3.54, 2.42,-4.43,-3.76, 0.43, 0.45, 2.56, 7.61, 4.47, 1.36, 2.34, 7.78, 7.13])

how can I create an empirical distribution of x and y ?

I tried

from statsmodels.distributions.empirical_distribution import ECDF

ecdf_x = ECDF(x)
x_ecdf = ecdf_y.y

ecdf_y = ECDF(y)
y_ecdf = ecdf_y.y

wasserstein_distance(x_ecdf, y_ecdf)

Would x_ecdf and y_ecdf be valid inputs to the function?

I think you do not need to convert your x,y to ECDF

import scipy
import scipy.stats
import numpy as np
x = np.array([0.12,-1.29,-3.23,-3.21,-0.13, 1.52, 4.45, 6.45, 5.17, 0.11, 3.48, 5.98, 7.55])
y = np.array([3.54, 2.42,-4.43,-3.76, 0.43, 0.45, 2.56, 7.61, 4.47, 1.36, 2.34, 7.78, 7.13])

scipy.stats.wasserstein_distance(x,y)
1.0376923076923077

scipy.stats.wasserstein_distance(u_values, v_values, u_weights=None, v_weights=None)

Parameters: u_values, v_values array_like

Examples from site:

from scipy.stats import wasserstein_distance
wasserstein_distance([0, 1, 3], [5, 6, 8])
#5.0

wasserstein_distance([0, 1], [0, 1], [3, 1], [2, 2])
#0.25

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM