简体   繁体   中英

Appending to numpy arrays

I'm trying to construct a numpy array, and then append integers and another array to it. I tried doing this:

xyz_list = frag_str.split()
nums = numpy.array([])
coords = numpy.array([])
for i in range(int(len(xyz_list)/4)):
    numpy.append(nums, xyz_list[i*4])
    numpy.append(coords, xyz_list[i*4+1:(i+1)*4])
print(atoms)
print(coords)

Printing out the output only gives my empty arrays. Why is that? In addition, how can I rewrite coords in a way that allows me to have 2D arrays like this: array[[0,0,0],[0,0,1],[0,0,-1]] ?

numpy.append , unlike python's list.append , does not perform operations in place. Therefore, you need to assign the result back to a variable, as below.

import numpy

xyz_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
nums = numpy.array([])
coords = numpy.array([])

for i in range(int(len(xyz_list)/4)):
    nums = numpy.append(nums, xyz_list[i*4])
    coords = numpy.append(coords, xyz_list[i*4+1:(i+1)*4])

print(nums)    # [ 1.  5.  9.]
print(coords)  # [  2.   3.   4.   6.   7.   8.  10.  11.  12.]

You can reshape coords as follows:

coords = coords.reshape(3, 3)

# array([[  2.,   3.,   4.],
#        [  6.,   7.,   8.],
#        [ 10.,  11.,  12.]])

More details on numpy.append behaviour

Documentation :

Returns: A copy of arr with values appended to axis. Note that append does not occur in-place: a new array is allocated and filled.

If you know the shape of your numpy array output beforehand, it is efficient to instantiate via np.zeros(n) and fill it with results later.

Another option: if your calculations make heavy use of inserting elements to the left of an array, consider using collections.deque from the standard library.

np.append is not a list clone. It is a clumsy wrapper to np.concatenate . It is better to learn to use that correctly.

xyz_list = frag_str.split()
nums = []
coords = []
for i in range(int(len(xyz_list)/4)):
    nums.append(xyz_list[i*4])
    coords.append(xyz_list[i*4+1:(i+1)*4])
nums = np.concatenate(nums)
coords = np.concatenate(coords)

List append is faster, and easier to initialize. np.concatenate works fine with a list of arrays. np.append uses concatenate , but only accepts two inputs. np.array is needed if the list contains numbers or strings.


You don't give an example of frag_str . But the name and the use of split suggests it is a string. I don't think anything else has a split method.

In [74]: alist = 'one two three four five six seven eight'.split()

That's a list of strings. Using your indexing I can construct 2 lists:

In [76]: [alist[i*4] for i in range(2)]
Out[76]: ['one', 'five']

In [77]: [alist[i*4+1:(i+1)*4] for i in range(2)]
Out[77]: [['two', 'three', 'four'], ['six', 'seven', 'eight']]

And I can make arrays from each of those lists:

In [78]: np.array(Out[76])
Out[78]: array(['one', 'five'], dtype='<U4')
In [79]: np.array(Out[77])
Out[79]: 
array([['two', 'three', 'four'],
       ['six', 'seven', 'eight']], dtype='<U5')

In the first case the array is 1d, in the second, 2d.

It the string contains digits, we can make an integer array by specifying dtype .

In [80]: alist = '1 2 3 4 5 6 7 8'.split()
In [81]: np.array([alist[i*4] for i in range(2)])
Out[81]: array(['1', '5'], dtype='<U1')
In [82]: np.array([alist[i*4] for i in range(2)], dtype=int)
Out[82]: array([1, 5])

As stated above, numpy.append does not append items in place, but the reason why is important. You must store the returned array from numpy.append to the original variable, or else your code will not work. That being said, you should likely rethink your logic.

Numpy uses C-style arrays internally, which are arrays in contiguous memory without leading or trailing unused elements. In order to append an item to an array, Numpy must allocate a buffer of the array size + 1, copy all the data over, and add the appended element.

In pseudo-C code, this comes to the following:

int* numpy_append(int* arr, size_t size, int element)
{
    int* new_arr = malloc(sizeof(int) * (size+1);
    mempcy(new_arr, arr, sizeof(int) * size);
    new_arr[size] = element;
    return new_arr;
}

This is extremely inefficient, since a new array must be allocated each time (memory allocation is slow), all the elements must be copied over, and the new element added to the end of the new array.

In comparison, Python lists reserve extra elements beyond the size of the container, until the size is the same as the capacity of the list, and grow exponentially. This is much more efficient for insertions at the end of the container than reallocating the entire buffer each time.

You should use Python lists and list.append , and then convert the new list to a NumPy array. Or, if performance is truly critical, use a C++-extension using std::vector rather than numpy.append in all scenarios. Re-write your code, or it will be glacial.

Edit

Also,as pointed out in the comments, if you know the size of a Numpy array before hand, pre-allocating it with np.zeros(n) is efficient, as is using a custom wrapper around a NumPy array

class extendable_array:
    def __init__(self, size=0, dtype=np.int):
        self.arr = np.array(dtype=dtype)
        self.size = size

    def grow(self):
        '''Double the array'''

        arr = self.arr
        self.arr = np.zeros(min(arr.size * 2, 1), dtype=arr.dtype)
        self.arr[:arr.size] = arr

    def append(self, value):
        '''Append a value to the array'''

        if self.arr.size == self.size:
            self.grow()

        self.arr[self.size] = value
        self.size += 1.

    # add more methods here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM