简体   繁体   English

每对 numpy.array 的中点

[英]Middle point of each pair of an numpy.array

I have an array of the form:我有一个表单数组:

x = np.array([ 1230., 1230., 1227., 1235., 1217., 1153., 1170.])

and I would like to produce another array where the values are the mean of each pair of values within my original array:我想生成另一个数组,其中的值是原始数组中每对值的平均值:

xm = np.array([ 1230., 1228.5, 1231., 1226., 1185., 1161.5])

Someone knows the easiest and fast way to do it without using loops?有人知道不使用循环的最简单快捷的方法吗?

Even shorter, slightly sweeter:更短,更甜:

(x[1:] + x[:-1]) / 2

  • This is faster:这更快:

     >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "x[:-1] + numpy.diff(x)/2" 100 loops, best of 3: 6.03 msec per loop >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[1:] + x[:-1]) / 2" 100 loops, best of 3: 4.07 msec per loop
  • This is perfectly accurate:这是完全准确的:

    Consider each element in x[1:] + x[:-1] .考虑x[1:] + x[:-1]每个元素。 So consider x₀ and x₁ , the first and second elements.所以考虑x₀x₁ ,第一个和第二个元素。

    x₀ + x₁ is calculated to perfect precision and then rounded, in accordance to IEEE.根据 IEEE, x₀ + x₁被计算为完美的精度,然后四舍五入。 It would therefore be the correct answer if that was all that was needed.因此,如果这就是所需要的,这将是正确的答案。

    (x₀ + x₁) / 2 is just half of that value. (x₀ + x₁) / 2只是该值的一半。 This can almost always be done by reducing the exponent by one, except in two cases:这几乎总是可以通过将指数减一来完成,除了两种情况:

    • x₀ + x₁ overflows. x₀ + x₁溢出。 This will result in an infinity (of either sign).这将导致无穷大(任一符号)。 That's not what is wanted, so the calculation will be wrong .这不是我们想要的,所以计算会出错

    • x₀ + x₁ underflows. x₀ + x₁下溢。 As the size is reduced , rounding will be perfect and thus the calculation will be correct .随着大小的减小,四舍五入将是完美的,因此计算将是正确的

    In all other cases, the calculation will be correct .在所有其他情况下,计算将是正确的


    Now consider x[:-1] + numpy.diff(x) / 2 .现在考虑x[:-1] + numpy.diff(x) / 2 This, by inspection of the source, evaluates directly to这通过检查来源,直接评估为

    x[:-1] + (x[1:] - x[:-1]) / 2

    and so consider again x₀ and x₁ .所以再次考虑x₀x₁

    x₁ - x₀ will have severe "problems" with underflow for many values. x₁ - x₀将有严重的“问题”和许多值的下溢 This will also lose precision with large cancellations.这也会因大量取消而失去精度。 It's not immediately clear that this doesn't matter if the signs are the same, though, as the error effectively cancels out on addition.但是,如果符号相同,这并不重要,因为错误在加法时有效地抵消了,所以目前还不清楚。 What does matter is that rounding occurs .重要的是舍入发生

    (x₁ - x₀) / 2 will be no less rounded, but then x₀ + (x₁ - x₀) / 2 involves another rounding. (x₁ - x₀) / 2将同样四舍五入,但是x₀ + (x₁ - x₀) / 2涉及另一个四舍五入。 This means that errors will creep in. Proof:这意味着错误蔓延。 证明:

     import numpy wins = draws = losses = 0 for _ in range(100000): a = numpy.random.random() b = numpy.random.random() / 0.146 x = (a+b)/2 y = a + (ba)/2 error_mine = (ax) - (xb) error_theirs = (ay) - (yb) if x != y: if abs(error_mine) < abs(error_theirs): wins += 1 elif abs(error_mine) == abs(error_theirs): draws += 1 else: losses += 1 else: draws += 1 wins / 1000 #>>> 12.44 draws / 1000 #>>> 87.56 losses / 1000 #>>> 0.0

    This shows that for the carefully chosen constant of 1.46 , a full 12-13% of answers are wrong with the diff variant!这表明,对于精心选择的常数1.46diff变体有 12-13% 的答案是错误的! As expected, my version is always right.正如预期的那样,我的版本总是正确的。

    Now consider underflow .现在考虑下溢 Although my variant has overflow problems, these are much less big a deal than cancellation problems.尽管我的变体存在溢出问题,但这些问题比取消问题要小得多。 It should be obvious why the double-rounding from the above logic is very problematic.应该很明显为什么从上面的逻辑双四舍五入是很成问题的。 Proof:证明:

     ... a = numpy.random.random() b = -numpy.random.random() ... wins / 1000 #>>> 25.149 draws / 1000 #>>> 74.851 losses / 1000 #>>> 0.0

    Yeah, it gets 25% wrong!是的,它有 25% 的错误!

    In fact, it doesn't take much pruning to get this up to 50%:事实上,不需要太多修剪就可以达到 50%:

     ... a = numpy.random.random() b = -a + numpy.random.random()/256 ... wins / 1000 #>>> 49.188 draws / 1000 #>>> 50.812 losses / 1000 #>>> 0.0

    Well, it's not that bad.嗯,没那么糟糕。 It's only ever 1 least-significant-bit off as long as the signs are the same , I think.我认为,只要符号相同,它就只有 1 个最低有效位。


So there you have it.所以你有它。 My answer is the best unless you're finding the average of two values whose sum exceeds 1.7976931348623157e+308 or is smaller than -1.7976931348623157e+308 .我的答案是最好的,除非您找到总和超过1.7976931348623157e+308或小于-1.7976931348623157e+308的两个值的平均值。

Short and sweet:简短而甜蜜:

x[:-1] + np.diff(x)/2

That is, take each element of x except the last, and add one-half of the difference between it and the subsequent element.也就是说,取x除最后一个元素之外的每个元素,并加上它与后续元素之间的差值的二分之一。

Try this:尝试这个:

midpoints = x[:-1] + np.diff(x)/2

It's pretty easy and should be fast.这很容易,应该很快。

If speed matters, use multiplication instead of division, following Veedrac answer:如果速度很重要,请按照 Veedrac 的回答使用乘法而不是除法:

    0.5 * (x[:-1] + x[1:])

Profiling results:分析结果:

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "0.5 * (x[:-1] + x[1:])"
    100 loops, best of 3: 4.20 msec per loop

    >>> python -m timeit -s "import numpy; x = numpy.random.random(1000000)" "(x[:-1] + x[1:]) / 2"
    100 loops, best of 3: 5.10 msec per loop
>>> x = np.array([ 1230., 1230., 1227., 1235., 1217., 1153., 1170.])

>>> (x+np.concatenate((x[1:], np.array([0]))))/2
array([ 1230. ,  1228.5,  1231. ,  1226. ,  1185. ,  1161.5,   585. ])

now you can just strip the last element, if you want现在你可以删除最后一个元素,如果你愿意

I end up using this operation a bunch on multi-dimensional arrays so I'll post my solution (inspired by the source code for np.diff() )我最终在多维数组上大量使用此操作,因此我将发布我的解决方案(灵感来自np.diff()的源代码)

def zcen(a, axis=0):
    a = np.asarray(a)
    nd = a.ndim
    slice1 = [slice(None)]*nd
    slice2 = [slice(None)]*nd
    slice1[axis] = slice(1, None)
    slice2[axis] = slice(None, -1)
    return (a[slice1]+a[slice2])/2

>>> a = [[1, 2, 3, 4, 5], [10, 20, 30, 40, 50]]
>>> zcen(a)
array([[  5.5,  11. ,  16.5,  22. ,  27.5]])
>>> zcen(a, axis=1)
array([[  1.5,   2.5,   3.5,   4.5],
       [ 15. ,  25. ,  35. ,  45. ]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM