简体   繁体   English

Numpy:初学者

[英]Numpy: Beginner nditer

I am trying to learn nditer for possible use in speeding up my application. 我正在努力学习nditer ,以便加速我的应用程序。 Here, i try to make a facetious reshape program that will take a size 20 array and reshape it to a 5x4 array: 在这里,我尝试制作一个小型的重塑程序,它将采用20号阵列并将其重塑为5x4阵列:

myArray = np.arange(20)
def fi_by_fo_100(array):
    offset = np.array([0, 4, 8, 12, 16])
    it = np.nditer([offset, None],
                      flags=['reduce_ok'],
                      op_flags=[['readonly'],
                                ['readwrite','allocate']],
                      op_axes=[None, [0,1,-1]],
                      itershape=(-1, 4, offset.size))

    while not it.finished:
        indices = np.arange(it[0],(it[0]+4), dtype=int)
        info = array.take(indices)
        '''Just for fun, we'll perform an operation on data.\
           Let's shift it to 100'''
        info = info + 81
        it.operands[1][...]=info
        it.iternext()
    return it.operands[1]

test = fi_by_fo_100(myArray)
>>> test
array([[ 97,  98,  99, 100]])

Clearly the program is overwriting each result into one row. 显然,该程序将每个结果重写为一行。 So i try using the indexing functionality of nditer, but still no dice. 所以我尝试使用nditer的索引功能,但仍然没有骰子。

flags=['reduce_ok','c_iter'] --> it.operands[1][it.index][...]=info = flags=['reduce_ok','c_iter'] - > it.operands[1][it.index][...]=info =
IndexError: index out of bounds

flags=['reduce_ok','c_iter'] --> it.operands[1][it.iterindex][...]=info = flags=['reduce_ok','c_iter'] - > it.operands[1][it.iterindex][...]=info =
IndexError: index out of bounds

flags=['reduce_ok','multi_iter'] --> it.operands[1][it.multi_index][...]=info = flags=['reduce_ok','multi_iter'] - > it.operands[1][it.multi_index][...]=info =
IndexError: index out of bounds

it[0][it.multi_index[1]][...]=info = it[0][it.multi_index[1]][...]=info =
IndexError: 0-d arrays can't be indexed

...and so on. ...等等。 What am i missing? 我错过了什么? Thanks in advance. 提前致谢。

Bonus Question 奖金问题

I just happened across this nice article on nditer . 我刚刚发现这篇关于nditer的好文章 I may be new to Numpy, but this is the first time i've seen Numpy speed benchmarks this far behind. 我可能是Numpy的新手,但这是我第一次看到Numpy速度基准测试远远落后。 It's my understanding that people choose Numpy for it's numerical speed and prowess, but iteration is a part of that, no? 我的理解是人们选择Numpy的数字速度和实力,但是迭代是其中的一部分,不是吗? What is the point of nditer if it's so slow? 如果它如此缓慢,那么nditer有什么意义呢?

It really helps to break things down by printing out what's going on along the way. 通过打印出正在发生的事情来打破事情真的很有帮助。

First, let's replace your whole loop with this: 首先,让我们用这个替换你的整个循环:

i = 0
while not it.finished:
    i += 1
print i

It'll print 20, not 5. That's because you're doing a 5x4 iteration, not 5x1. 它会打印20,而不是5.那是因为你正在进行5x4迭代,而不是5x1。

So, why is this even close to working? 那么,为什么这甚至接近工作? Well, let's look at the loop more carefully: 好吧,让我们更仔细地看一下循环:

while not it.finished:
    print '>', it.operands[0], it[0]
    indices = np.arange(it[0],(it[0]+4), dtype=int)
    info = array.take(indices)
    info = info + 81
    it.operands[1][...]=info
    print '<', it.operands[1], it[1]

You'll see that the first five loops go through [0 4 8 12 16] five times, generating [[81 82 83 84]] , then [[85 86 87 88]] , etc. And then the next five loops do the same thing, and again and again. 您会看到前五个循环经过[0 4 8 12 16]五次,生成[[81 82 83 84]] ,然后[[85 86 87 88]]等等。然后接下来的五个循环执行同样的事情,一次又一次。

This is also why your c_index solutions didn't work—because it.index is going to range from 0 to 19, and you don't have 20 of anything in it.operands[1] . 这也是你的c_index解决方案不起作用的原因 - 因为it.index范围是0到19,而你在it.operands[1]没有20个。

If you did the multi_index right and ignored the columns, you could make this work… but still, you'd be doing a 5x4 iteration, just to repeat each step 4 times, instead of doing the 5x1 iteration you want. 如果你正确地执行了multi_index并忽略了列,你可以使这个工作......但是,你仍然要做5x4迭代,只是重复每个步骤4次,而不是做你想要的5x1迭代。

Your it.operands[1][...]=info replaces the entire output with a 5x1 row each time through the loop. 你的it.operands[1][...]=info每次通过循环用5x1行替换整个输出。 Generally, you shouldn't ever have to do anything to it.operands[1] —the whole point of nditer is that you just take care of each it[1] , and the final it.operands[1] is the result. 一般情况下,你不应该曾经有做任何事情来it.operands[1]的-the整点nditer是,你只需要每个照顾it[1]并最终it.operands[1]是结果。

Of course a 5x4 iteration over rows makes no sense. 当然,对行进行5x4迭代是没有意义的。 Either do a 5x4 iteration over individual values, or a 5x1 iteration over rows. 要么对单个值进行5x4迭代,要么对行进行5x1迭代。

If you want the former, the easiest way to do it is to reshape the input array, then just iterate that: 如果你想要前者,最简单的方法是重塑输入数组,然后迭代:

it = np.nditer([array.reshape(5, -1), None],
               op_flags=[['readonly'],
                         ['readwrite','allocate']])
for a, b in it:
    b[...] = a + 81
return it.operands[1]

But of course that's silly—it's just a slower and more complicated way of writing: 但当然这很愚蠢 - 这只是一种更慢,更复杂的写作方式:

return array+81

And it would be a bit silly to suggest that "the way to write your own reshape is to first call reshape , and then…" 并且建议“写自己的reshape是首先调用reshape ,然后......”会有点愚蠢。

So, you want to iterate over rows, right? 所以,你想迭代行,对吗?

Let's simplify things a bit by getting rid of the allocate and explicitly creating a 5x4 array to start with: 让我们通过摆脱allocate并显式创建一个5x4数组来简化一些事情:

outarray = np.zeros((5,4), dtype=array.dtype)
offset = np.array([0, 4, 8, 12, 16])
it = np.nditer([offset, outarray],
               flags=['reduce_ok'],
               op_flags=[['readonly'],
                         ['readwrite']],
               op_axes=[None, [0]],
               itershape=[5])

while not it.finished:
    indices = np.arange(it[0],(it[0]+4), dtype=int)
    info = array.take(indices)
    '''Just for fun, we'll perform an operation on data.\
       Let's shift it to 100'''
    info = info + 81
    it.operands[1][it.index][...]=info
    it.iternext()
return it.operands[1]

This is a bit of an abuse of nditer , but at least it does the right thing. 这有点滥用nditer ,但至少它做对了。

Since you're just doing a 1D iteration over the source and basically ignoring the second, there's really no good reason to use nditer here. 既然你只是在源上进行一维迭代而基本上忽略了第二次,那么在这里使用nditer真的没有充分的理由。 If you need to do lockstep iteration over multiple arrays, for a, b in nditer([x, y], …) is cleaner than iterating over x and using the index to access y —just like for a, b in zip(x, y) outside of numpy . 如果你需要对多个数组进行锁步迭代, for a, b in nditer([x, y], …)比迭代x和使用索引访问y清晰 - 就像for a, b in zip(x, y)numpy之外。 And if you need to iterate over multi-dimensional arrays, nditer is usually cleaner than the alternatives. 如果你需要迭代多维数组, nditer通常比替代品更干净。 But here, all you're really doing is iterating over [0, 4, 8, 16, 20] , doing something with the result, and copying it into another array . 但是在这里,你所做的只是迭代[0, 4, 8, 16, 20] 0,4,8,16,20 [0, 4, 8, 16, 20] ,对结果做一些事情,然后将它复制到另一个array

Also, as I mentioned in the comments, if you find yourself using iteration in numpy , you're usually doing something wrong. 另外,正如我在评论中提到的,如果你发现自己在numpy使用迭代,那么你通常会做错事。 All of the speed benefits of numpy come from letting it execute the tight loops in native C/Fortran or lower-level vector operations. numpy所有速度优势来自让它在本机C / Fortran或更低级别的向量操作中执行紧密循环。 Once you're looping over array s, you're effectively just doing slow Python numerics with a slightly nicer syntax: 一旦你循环遍历array ,你实际上只是使用稍微好一点的语法来做慢速Python数字:

import numpy as np
import timeit

def add10_numpy(array):
    return array + 10

def add10_nditer(array):
    it = np.nditer([array, None], [],
                   [['readonly'], ['writeonly', 'allocate']])
    for a, b in it:
        np.add(a, 10, b)
    return it.operands[1]

def add10_py(array):
    x, y = array.shape
    outarray = array.copy()
    for i in xrange(x):
        for j in xrange(y):
            outarray[i, j] = array[i, j] + 10
    return out array

myArray = np.arange(100000).reshape(250,-1)

for f in add10_numpy, add10_nditer, add10_py:
    print '%12s: %s' % (f.__name__, timeit.timeit(lambda: f(myArray), number=1))

On my system, this prints: 在我的系统上,这打印:

 add10_numpy: 0.000458002090454
add10_nditer: 0.292730093002
    add10_py: 0.127345085144

That shows you the cost of using nditer unnecessarily. 这显示了不必要地使用nditer的成本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM