[英]Different results with NumPy percentile and TensorFlow percentile for “nearest” interpolation method
我注意到,即使 NumPy 的numpy.percentile
和 TensorFlow Probability 的tfp.stats.percentile
对他们的“最近”插值方法给出了相同的文档字符串解释
此可选参数指定当所需百分位数位于两个数据点
i < j
之间时要使用的插值方法:...
'最近的':
i
或j
,以最近者为准。
他们给出不同的结果。 下面是我的意思的一个最小的工作示例。
$ "$(which python3)" --version
Python 3.7.5
$ python3 -m venv "${HOME}/.venvs/question"
$ . "${HOME}/.venvs/question/bin/activate"
(question) $ cat requirements.txt
numpy~=1.18
tensorflow~=2.1
tensorflow-probability~=0.9
black
(question) $ python -m pip install -r requirements.txt
# question.py
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
def main():
a = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]])
q = 50
print(f"Flattened array: {a.flatten()}")
print("NumPy:")
print(f"\t{q}th percentile (linear): {np.percentile(a, q, interpolation='linear')}")
print(
f"\t{q}th percentile (nearest): {np.percentile(a, q, interpolation='nearest')}"
)
b = tf.convert_to_tensor(a)
print("TensorFlow:")
print(
f"\t{q}th percentile (linear): {tfp.stats.percentile(b, q, interpolation='linear')}"
)
print(
f"\t{q}th percentile (nearest): {tfp.stats.percentile(b, q, interpolation='nearest')}"
)
if __name__ == '__main__':
main()
当运行时,“最近”插值方法会给出不同的结果
(question) $ python question.py
Flattened array: [10. 7. 4. 3. 2. 1.]
NumPy:
50th percentile (linear): 3.5
50th percentile (nearest): 3.0
TensorFlow:
50th percentile (linear): 3.5
50th percentile (nearest): 4.0
在浏览了 function 的NumPy v1.18.2 源代码之后, numpy.percentile
正在调用我仍然对为什么感到困惑。 这似乎是由于四舍五入的决定(假设NumPy 使用numpy.around
和TFP 使用tf.round
)。
有人可以向我解释导致差异的原因吗? 我想为这些函数制作一个垫片,但我需要了解返回行为。
遍历两者的来源,似乎它不是像我第一次那样的舍入问题,而是numpy.percentile
在升序排序的 ndarray 上进行最终评估,而tfp.stats.percentile
在降序排序的张量上进行。
# answer.py
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
from tensorflow_probability.python.internal import tensorshape_util
from tensorflow_probability.python.internal import distribution_util
def numpy_src(input, q, axis=0, out=None):
a = input
q = np.true_divide(q, 100) # 0.5
q = np.asanyarray(q) # array(0.5)
q = q[None] # array([0.5])
ap = a.flatten() # array([10., 7., 4., 3., 2., 1.])
Nx = ap.shape[axis] # 6
indices = q * (Nx - 1) # array([2.5])
indices = np.around(indices).astype(np.intp) # array([2])
ap.partition(indices, axis=axis) # array([ 1., 2., 3., 4., 7., 10.])
indices = indices[0] # 2
r = np.take(ap, indices, axis=axis, out=out) # 3.0
print(f"Result of np.percentile source: {r}")
def tensorflow_src(input, q=50, axis=None):
x = input
name = "percentile"
interpolation = "nearest"
q = tf.cast(q, tf.float64) # tf.Tensor(50.0, shape=(), dtype=float64)
if axis is None:
y = tf.reshape(
x, [-1]
) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64)
frac_at_q_or_above = 1.0 - q / 100.0 # tf.Tensor(0.5, shape=(), dtype=float64)
# _sort_tensor(y)
# N.B. Here is the difference. Note the sort order is never changed
sorted_y, _ = tf.math.top_k(
y, k=tf.shape(y)[-1]
) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64), _
tensorshape_util.set_shape(
sorted_y, y.shape
) # tf.Tensor([10. 7. 4. 3. 2. 1.], shape=(6,), dtype=float64)
d = tf.cast(tf.shape(y)[-1], tf.float64) # tf.Tensor(6.0, shape=(), dtype=float64)
# _get_indices(interpolation)
indices = tf.round(
(d - 1) * frac_at_q_or_above
) # tf.Tensor(2.0, shape=(), dtype=float64)
indices = tf.clip_by_value(
tf.cast(indices, tf.int32), 0, tf.shape(y)[-1] - 1
) # tf.Tensor(2, shape=(), dtype=int32)
# N.B. The sort order here is descending, causing a difference
gathered_y = tf.gather(
sorted_y, indices, axis=-1
) # tf.Tensor(4.0, shape=(), dtype=float64)
result = distribution_util.rotate_transpose(gathered_y, tf.rank(q)) # 4.0
print(f"Result of tf.percentile source: {result}")
def main():
np_in = np.array([[10.0, 7.0, 4.0], [3.0, 2.0, 1.0]])
numpy_src(np_in, q=50)
tf_in = tf.convert_to_tensor(np_in)
tensorflow_src(tf_in, q=50)
if __name__ == "__main__":
main()
运行时给出
$ python answer.py
Result of np.percentile source: 3.0
Result of tf.percentile source: 4.0
相反,如果在 TensorFlow 概率的percentile
中添加以下内容,以使评估的排序顺序升序
sorted_y = tf.reverse(
sorted_y, [-1]
) # tf.Tensor([ 1. 2. 3. 4. 7. 10.], shape=(6,), dtype=float64)
那么这两个结果将是相同的
$ python answer.py
Result of np.percentile source: 3.0
Result of tf.percentile source: 3.0
鉴于 TensorFlow Probability 的文档字符串说
给定一个向量
x
, x 的第q
个百分位数是x
的排序副本中从最小值到最大值的值x
q / 100
。
这似乎是错误的,因为它正好相反。 我已经打开TensorFlow 概率问题 864来讨论这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.