简体   繁体   中英

Numpy array as a vector with a non-unique property

I have a set of data that I would like to treat with numpy. The data can be looked at as a set of points in space with an additional property variable that I would like to handle as an object. Depending on a set of data, the vectors may be of length 1, 2, or 3, but is the same length for all points in a given set of data. The property object is a custom class that may be the same for any two given points.

So consider this data as a random example (C and H represent objects that contain atomic properties for Carbon or Hydrogen ... or just some random object). These will not be read in through a file, but created by an algorithm. Here the C object may be the same or it may be different (isotope for example).

Example 3D data set (just abstract representation)
C 1 2 3
C 3 4 5
H 1 1 4

I would like to have a numpy array that contains all of the atomic positions, so that I can perform numpy operations like vector manipulation and such as a translation function def translate(data,vec):return data + vec . I would also like to handle the property objects in parallel. One option would be to have two separate arrays for both, but if I delete an element of one, I would have to explicitly delete the property array value as well. This could get difficult to handle.

I considered using numpy.recarray

x = np.array([(1.0,2,3, "C"), (3.0,2,3, "H")], dtype=[('x', "float64" ),('y',"float6

4"),('z',"float64"), ('type', object)])

But it seems the shape of this array is (2,) , which means that each record is handled independently. Also, I cannot seem to understand how to get vector manipulation to work with this type:

def translate(data,vec):return data + vec
translate(x,np.array([1,2,3]))
...
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'

Is numpy.recarray what I should be using? Is there a better way to handle this in a simpler way such that I have a separate numerical matrix of points with a parallel object array that are linked in case an element is removed ( np.delete )? I also briefly considered writing an array object that extends ndarray , but I feel like this may be unnecessary and potentially disastrous.

Any thoughts or suggestions would be very helpful.

The field of a recarray can be a ndarray, if you pass the tuple (name, type, shape) as the dtype of the field:

In [9]:

import numpy as np

x = np.array([((1.0,2,3), "C"), ((3.0,2,3), "H")], dtype=[('xyz', "float64", (3,)), ('type', object)])

In [11]:

np.delete(x, 0)

Out[11]:

array([([3.0, 2.0, 3.0], 'H')], 
      dtype=[('xyz', '<f8', (3,)), ('type', 'O')])

In [12]:

x["xyz"]

Out[12]:

array([[ 1.,  2.,  3.],
       [ 3.,  2.,  3.]])

In [14]:

x["xyz"] + (10, 20, 30)

Out[14]:

array([[ 11.,  22.,  33.],
       [ 13.,  22.,  33.]])

For your translate function:

def translate(data,vec):
    tmp = data.copy()
    tmp["xyz"] += vect
    return tmp

If you want more flexible functions, you may consider using Pandas.DataFrame .

If you are dealing with collections of atoms, you may consider to use the Atoms class from Atomic Simulation Environment (ASE) . It stores atom types, positions and has list-like methods to manipulate them.

One quick and dirty way would be to set the last (or indeed any) column to be a numerical lookup to a labels dictionary:

>>> import numpy
>>> labels = ['H', 'C', 'O']
>>> labels_refs = dict(zip(labels, numpy.arange(len(labels), dtype='float64')))
>>> reverse_labels_refs = dict(zip(numpy.arange(len(labels), dtype='float64'), labels))
>>> x = numpy.array([
...     [1.0,2,3, labels_refs['C']], 
...     [3.0,2,3, labels_refs['H']],
...     [2.0,2,3, labels_refs['C']]])
>>> x
array([[ 1.,  2.,  3.,  1.],
       [ 3.,  2.,  3.,  0.],
       [ 2.,  2.,  3.,  1.]])
>>> extract_refs = numpy.vectorize(
...         lambda label_ref: reverse_labels_refs[label_ref])
>>> labels = extract_refs(x[:, -1]) # Turn the last column back into labels
>>> labels
array(['C', 'H', 'C'], 
      dtype='|S8')

You can also lookup rows by their labels (as an example):

>>> x[numpy.where(x[:,-1] == labels_refs['C']), :-1]
array([[[ 1.,  2.,  3.],
        [ 2.,  2.,  3.]]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM