简体   繁体   中英

Python: weighted percentile for each row of array

I would like to calculate the weighted median of each row of a pandas dataframe.

I found this nice function ( https://stackoverflow.com/a/29677616/10588967 ), but I don't seem to be able to pass a 2d array.

def weighted_quantile(values, quantiles, sample_weight=None, values_sorted=False, old_style=False):
""" Very close to numpy.percentile, but supports weights.
NOTE: quantiles should be in [0, 1]!
:param values: numpy.array with data
:param quantiles: array-like with many quantiles needed
:param sample_weight: array-like of the same length as `array`
:param values_sorted: bool, if True, then will avoid sorting of initial array
:param old_style: if True, will correct output to be consistent with numpy.percentile.
:return: numpy.array with computed quantiles.
"""
values = numpy.array(values)
quantiles = numpy.array(quantiles)
if sample_weight is None:
    sample_weight = numpy.ones(len(values))
sample_weight = numpy.array(sample_weight)
assert numpy.all(quantiles >= 0) and numpy.all(quantiles <= 1), 'quantiles should be in [0, 1]'

if not values_sorted:
    sorter = numpy.argsort(values)
    values = values[sorter]
    sample_weight = sample_weight[sorter]

weighted_quantiles = numpy.cumsum(sample_weight) - 0.5 * sample_weight
if old_style:
    # To be convenient with numpy.percentile
    weighted_quantiles -= weighted_quantiles[0]
    weighted_quantiles /= weighted_quantiles[-1]
else:
    weighted_quantiles /= numpy.sum(sample_weight)
return numpy.interp(quantiles, weighted_quantiles, values)

Using the code from the link, the following works:

weighted_quantile([1, 2, 9, 3.2, 4], [0.0, 0.5, 1.])

However, this does not work:

values = numpy.random.randn(10,5)
quantiles = [0.0, 0.5, 1.]
sample_weight = numpy.random.randn(10,5)
weighted_quantile(values, quantiles, sample_weight)

I receive the following error:

weighted_quantiles = np.cumsum(sample_weight) - 0.5 * sample_weight

ValueError: operands could not be broadcast together with shapes (250,) (10,5,5)

Question Is it possible to apply this weighted quantile function in a vectorized manner on a dataframe, or I can only achieve this using .apply()?

Many thanks for your time!

 np.cumsum(sample_weight)

return a 1D list. So you would like to reshape it to (10,5,5) using

np.cumsum(sample_weight).reshape(10,5,5)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM