简体   繁体   中英

Merge lists with same first element in list of lists

I have a list of lists:

a = [[0, 1], [0, 2], [0, 26], [0, 74], [1, 77], [1, 80], [1, 81], [2, 117], [2, 118], [2, 119], [2, 120]]

How can I combine all lists in the list with the same first element

Desired output:

a = [[0, 1, 2, 26, 74], [1, 77, 80, 81], [2, 117, 118, 119, 120]]

Try this:

d = {}
for key, value in a:
   if key not in d.keys():
      d[key] = [key]
   d[key].append(value)
result = list(d.values())
from collections import defaultdict tmp = defaultdict(list) for key, val in a: tmp[key].append(val) print([[key] + val for key, val in tmp.items()])

I'll do it this way.
Here I assume that input is a list of sublist 2 lenght long.

def merge_list(input):
    res = [] # Final list
    a = []   # Just make a list of the first element of each list
    for i in input:
        if i[0] not in a:
            a.append(i[0])
    for i in a:
        b = [i]
        for j in input:
            if j[0] == i:
                # If you want input like [[1, 2, 3], [1, 4, 6]..]
                # Copy with a for excluding the first element instead of this j[1]
                b.append(j[1])
        res.append(b)
    print(res)

I think the other answers here are specific to two item lists. Here's one that works with any number of items in your sublists (as long as there's at least one):

a = [[0, 1], [0, 2], [0, 26], [0, 74], [1, 77], [1, 80], [1, 81], [2, 117], [2, 118], [2, 119], [2, 120]]
output_dict = {}
for key, *values in a:
    if key not in output_dict:
        output_dict[key] = [key]
    output_dict[key].extend(values)

Now the results are in output_dict.values() .

Since this question has a numpy tag I'll extend about possible ways to solve it in numpy . In general, this is called a group by problem . There are many ways you can do this in numpy . You can classify them into two categories:

The second type of solutions won't work in general if IDs of groups are large but this is a significant boost of np.unique in case IDS are small.

You need to sort your data by the first column before you apply any kind of these methods:

a = np.array(a)
arr = a[a[:, 0].argsort()]

Then you can choose your method of grouping and a custom return:

def _custom_return(unique_id, a, split_idx, return_groups):
    '''Choose if you want to also return unique ids'''
    if return_groups:
        return unique_id, np.split(a[:,1], split_idx)
    else: 
        return np.split(a[:,1], split_idx)
    
def numpy_groupby_index(a, return_groups=True):
    '''Code refactor of method of Vincent J'''
    u, idx = np.unique(a[:,0], return_index=True) 
    return _custom_return(u, a, idx[1:], return_groups)

def numpy_groupby_bins(a, return_groups=True):  
    '''Significant boost of np.unique by np.bincount'''
    bins = np.bincount(a[:,0])
    nonzero_bins_idx = bins != 0
    nonzero_bins = bins[nonzero_bins_idx]
    idx = np.cumsum(nonzero_bins[:-1])
    return _custom_return(np.flatnonzero(nonzero_bins_idx), a, idx, return_groups)

numpy_groupby_bins(arr, return_groups=True)
>>> (array([0, 1, 2]),
[array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])])
numpy_groupby_bins(arr, return_groups=False)
>>> [array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])]
numpy_groupby_index(arr, return_groups=True)
>>> (array([0, 1, 2]),
[array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])])
numpy_groupby_index(arr, return_groups=False)
>>> [array([ 1,  2, 26, 74]), array([77, 80, 81]), array([117, 118, 119, 120])]

Note that all the methods contain np.split method which is based on list.append under the hood and hence it is not efficient in case you've got a big bunch of small groups. This happens because numpy is not designed to work with arrays of different lengths.

Also note that the output you expect requires one more iteration:

groups = numpy_groupby_index(arr, return_groups=True)
out = [np.r_[key, group] for key, group in zip(*groups)]
out
>>> [array([ 0,  1,  2, 26, 74]),
 array([ 1, 77, 80, 81]),
 array([  2, 117, 118, 119, 120])]

If you're interested in performant solutions of this problem you could also read my further analysis on this kind of problem

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM