简体   繁体   中英

Numpy/Pandas: Merge two numpy arrays based on one array efficiently

I have two numpy arrays like so (comprised of two-set tuples):

a = [(1, "alpha"), (2, 3), ...]
b = [(1, "zylo"), (1, "xen"), (2, "potato", ...]

The first element in the tuple is the identifier and shared between both arrays , so I want to create a new numpy array which looks like this:

[(1, "alpha", "zylo", "xen"), (2, 3, "potato"), etc...]

My current solution works, but it's way too inefficient for me. Looks like this:

aggregate_collection = []
for tuple_set in a:
  for tuple_set2 in b:
    if tuple_set[0] == tuple_set2[0] and other_condition:
      temp_tup = (tuple_set[0], other tuple values)
      aggregate_collection.append(temp_tup)

How can I do this efficiently?

I'd concatenate these into a data frame and just groupby + agg

(pd.concat([pd.DataFrame(a), pd.DataFrame(b)])
   .groupby(0)
   .agg(lambda s: [s.name, *s])[1])

where 0 and 1 are the default column names given by creating a dataframe via pd.DataFrame . Change it to your column names.

In [278]: a = [(1, "alpha"), (2, 3)]
     ...: b = [(1, "zylo"), (1, "xen"), (2, "potato")]
In [279]: a
Out[279]: [(1, 'alpha'), (2, 3)]
In [280]: b
Out[280]: [(1, 'zylo'), (1, 'xen'), (2, 'potato')]

Note that if I try to make an array from a I get something quite different.

In [281]: np.array(a)
Out[281]: 
array([['1', 'alpha'],
       ['2', '3']], dtype='<U21')
In [282]: _.shape
Out[282]: (2, 2)

defaultdict is a handy tool for collecting like-keyed values

In [283]: from collections import defaultdict
In [284]: dd = defaultdict(list)
In [285]: for tup in a+b:
     ...:     k,v = tup
     ...:     dd[k].append(v)
     ...: 
In [286]: dd
Out[286]: defaultdict(list, {1: ['alpha', 'zylo', 'xen'], 2: [3, 'potato']})

which can be cast as a list of tuples with:

In [288]: [(k,*v) for k,v in dd.items()]
Out[288]: [(1, 'alpha', 'zylo', 'xen'), (2, 3, 'potato')]

I'm using a+b to join the lists, since it apparently doesn't matter where the tuples occur.

Out[288] is even a poor numpy fit, since the tuples differ in size, and items (other than the first) might be strings or numbers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM