简体   繁体   中英

Numpy: Tuple-Indexing of rows in ndarray (for later selection of subsets)

I'm fairly new to NumPy, and also not the most expierenced Python programmer, so please excuse me if this seems trivial to you ;)

I am writing a script to extract specific data out of several molecular-dynamics simulation. Therefore I read data out of some files and modify and truncate them to a uniform length and add everything together row-wise, to form a 2D-array for each simulation run.

These arrays are appended to each other, so that I ultimately get a 3D-Array, where each slice along the z-Axis would represent a dataset of a specific simulation run. The goal is to later on do easy manipulation, eg averaging over all simulation runs.

This is just to give you the basic idea of what is done:

import numpy as np

A = np.zeros((2000), dtype = bool)
A = A.reshape((1, 2000))

# Appending different rows to form a '2D-Matrix', 
# this is the actual data per simulation run
for i in xrange(1,103):
    B = np.zeros((2000), dtype = bool)
    B = B.reshape((1, 2000))
    A = np.concatenate((A, B), axis=0)

print A.shape
# >>> (2000, 103)

C = np.expand_dims(A, axis=2)
A = np.expand_dims(A, axis=2)

print A.shape
# >>> (2000, 103, 1)

# Appending different '2D-Matrices' to form a 3D array, 
# each slice along the z-Axis representing one simulation run
for i in xrange(1,50):
    A = np.concatenate((A, C), axis=2) 

print A.shape
# >>> (2000, 103, 50)

So far so good, now to the actual question:

In one 2D-array, each row represents a different set of interacting atom-pairs. I later on want to create subsets of the array, depending on different critera - eg 'show me all pairs, where the distance x is 10 < x <= 20'.

So when I first add the rows together in for i in xrange(1,103): ... , I want to include indexing of the rows with a set of int s for each row. The data of atom pairs is there anyway, at the moment I'm just not including it in the ndarray .

I was thinking of a tuple, so that my 2D-Array would look like

[ [('int' a,'int' b), [False,True,False,...]],
  [('int' a,'int' d), [True, False, True...]],
 ...
]

Or something like that

[ [['int' a], ['int' b], [False,True,False,...]],
  [['int' a], ['int' d], [True, False, True...]],
 ...
]

Can you think of another or easier approach for this kind of filtering? I'm not quite sure if I'm on the right track here and it doesn't seem to be very straight-forward to have different datatypes in an array like that.

Also notice, that all indexes are ordered in the same way in each 2D-array, because I sort them ( atm based on a String) and add np.zeros() rows for those that only occur on other simulation runs. Maybe a Lookup-table is the right approach?

Thanks a lot!

Update/Answer:

Sorry, I know the question was a little bit too specific and bloated with code that wasn't relevant to the question.

I answered the question myself, and for the sake of documentation you can find it below. It is specific, but maybe it helps someone to handle his indexing in numpy .

Short, general answer:

I basically just created a look-up-table as a python list and did a very simple numpy slicing operation for selection with a mask, containing indices:

A = [[[1, 2],
      [3, 4],
      [5, 6]],

     [[7, 8],
      [9,10],
      [11,12]]]
A = np.asarray(A)

# selects only rows 1 and 2 from each 2D array
mask = [1,2]
B = A[ : , mask, : ]

Which gives for B :

[[[ 3  4]
  [ 5  6]]

 [[ 9 10]
  [11 12]]]

Complete answer, specific for my question above:

This is my 2D array:

A =[[True, False, False, False, False],
    [False, True, False, False, False],
    [False, False, True, False, False]]
A = np.asarray(A)

Indexing of the rows as tuples, this is due to my specific problem eg:

lut = [(1,2),(3,4),(3,5)]

Append other 2D array to form a 3D array:

C = np.expand_dims(A, axis=0)
A = np.expand_dims(A, axis=0)

A = np.concatenate((A, C), axis=0) 

This is the 3D Array A :

 >[[[ True False False False False]
    [False  True False False False]
    [False False  True False False]]

   [[ True False False False False]
    [False  True False False False]
    [False False  True False False]]]

Selecting rows, which contain "3" in the Look-up-Table

mask = [i for i, v in enumerate(lut) if 3 in v] 

> [1, 2]

Applying mask to the 3D-array:

B = A[ : , mask, : ]

Now B is the 3D array A after selection:

[[[False  True False False False]
   [False False  True False False]]

  [[False  True False False False]
   [False False  True False False]]]

To keep track of the new indices of B : create a new Look-up-Table for further computation:

newLut = [v for i, v in enumerate(lut) if i in mask] 

>[(3, 4), (3, 5)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM