简体   繁体   中英

How to merge related array items into one along the rows?

After a bunch of distance-wise computation for specifying neighbors of every single atom, I end up with the following neighbor table (First column for the atom itself, second for its neighbor):

array([[ 0,  1],
       [ 1,  0],
       [ 1,  2],
       [ 2,  1],
       [ 2,  3],
       [ 3,  2],
       [ 3,  4],
       [ 4,  3],
       [ 4,  5],
          ...
       [48, 47],
       [48, 49],
       [49, 48]])

For instance, the 0th atom has only one neighbor, which is indexed by 1 (it's the meaning of the 0th row). The second atom, which is indexed by 1 , has two neighbors indexed by 0 and 2 since the number 1 is in between them. It goes like that, and at the end, as there is no atom indexed by a number greater than 49 , the last atom has only one neighbor just like the 0th atom, and that neighbor is the atom indexed by the number 48 .

What I want is to alter this array in a way that every row refers to only one atom and its neighbors, such that:

array([[ 0,  1],
       [ 1,  0, 2],
       [ 2,  1, 3],
       [ 3,  2, 4],
       [ 4,  3, 5],
          ...
       [48, 47, 49],
       [49, 48]])

where the first column refers to atoms themselves, and the rest of the columns refer to their whole neighbors.

Because the array will contain hundreds of thousands items, and that it will be called for thousands of times, I don't want to use a python loop. I'm searching for very efficient way of doing this. Moreover, the neighbors don't have to be one for the first and the last atoms, and two for the rest of the atoms; number of neighbors for an atom can change. Hence, some indexing methods probably won't work for this problem although it may work at first.

I thought about array manipulation methods, but I didn't manage to solve my problem. I'd be appreciated if you could guide me to solve this problem. Thank you.

This looks like a groupby -type operation, and NumPy doesn't have much built-in functionality for group-by operations, however pandas does.

Here's an example of doing this efficiently using a pandas groupby:

import numpy as np
import pandas as pd

neighbors = np.array([[ 0,  1],
                      [ 1,  0],
                      [ 1,  2],
                      [ 2,  1],
                      [ 2,  3],
                      [ 3,  2],
                      [ 3,  4],
                      [ 4,  3],
                      [ 4,  5],
                      [48, 47],
                      [48, 49],
                      [49, 48]])

g = pd.Series(neighbors[:, 1]).groupby(neighbors[:, 0]).apply(list)
grouped = pd.DataFrame(g.to_list(), index=g.index).reset_index().to_numpy()

print(grouped)
# array([[ 0.,  1., nan],
#        [ 1.,  0.,  2.],
#        [ 2.,  1.,  3.],
#        [ 3.,  2.,  4.],
#        [ 4.,  3.,  5.],
#        [48., 47., 49.],
#        [49., 48., nan]])

Note that numpy cannot have heterogeneous row lengths in a single array; here pandas uses np.nan as a fill value for missing entries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM