简体   繁体   中英

Combine two lists of strings

Given two lists of strings that contain duplicates save for one element in each list, how would you combine the two into a single list that contains one copy of every value in list order?

For example, given the following two lists in Python:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

Or

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

How would you combine the two lists to get a single list like this:

result = ['First', 'Second', 'Third', 'Fourth']

Note that the exact values of the strings cannot necessarily be trusted to help with ordering the elements.

I am aware of the possibility that there will be some cases with no definitive way to lock the list down to a particular order, and will probably have to special-case those, but for the general cases I'd rather have a procedure to follow. For example:

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Fourth']

This could have 'Third' and 'Second' in either order, as there's no item on both lists between them to provide a guideline.

Edit: I should explain the strings a bit further, as I see many of you are assuming that I can merely sort a raw merge of the two lists, and this just isn't going to work.

I'm taking story titles, which, for each story, only list the other instalments and not the linked story itself. So by taking two lists (or possibly more, I'm not sure), I can come up with a full list of the instalments to put them in their proper order.

Simple algorythm:

  1. Concat lists
  2. Remove dups
  3. Sort

Code:

def order_list(lst, order_dict):
     return sorted(list(lst), key = lambda x: order_dict.get(x, -1))

c = list(set(a + b))
ord_dict = {"First": 1, "Second": 2, "Third": 3, "Fourth": 4}
order_list(c, ord_dict)

You have 2 different concerns here:

  • Duplicate elimination
  • Ordering

I would do them separately. Duplication elimination is simple enough. Use a set :

>>> a = ['Second', 'Third', 'Fourth']
>>> b = ['First', 'Second', 'Third']
>>> x = set(a)
>>> x
set(['Second', 'Fourth', 'Third'])
>>> x.update(b)
>>> x
set(['Second', 'Fourth', 'Third', 'First'])

Then you'll need to a define the ordering somehow. The simplest way to do that might be to map each possible element to a value:

>>> order_dict = {'First': 1, 'Second': 2, 'Third': 3, 'Fourth': 4}
>>> result = sorted(list(x), key=lambda i: order_dict[i])
>>> result
['First', 'Second', 'Third', 'Fourth']

Alternatively, you could use some kind of compare function with sorted 's cmp argument if you can define one for your values.

Hope this helps.

If we assume that your two lists are both ordered, and that they are each missing only some elements from the full set, then I can kind of see an algorithm that should work most of the time .

  1. Take the next index in A.
  2. Step through B looking for a match:
    1. If there was a match:
      • Remove everything from the start of B up to and including the match in B, and add to C
    2. If there was no match:
      • Add index A to C
  3. Repeat
  4. If there's anything left in B, add it to C.

This is the python code for the algorithm:

a1 = ['Second', 'Third', 'Fourth']
b1 = ['First', 'Second', 'Third']

a2 = ['First', 'Third', 'Fourth']
b2 = ['First', 'Second', 'Third']

a3 = ['First', 'Third', 'Fourth']
b3 = ['First', 'Second', 'Fourth']

def merge(a, b):
    c = []
    b_oldindex = 0
    for a_index in range(len(a)):
        match = False
        for b_index in range(b_oldindex, len(b)):
            if a[a_index] == b[b_index]:
                c.extend(b[b_oldindex:b_index+1])
                b_oldindex = b_index + 1
                match = True
                break
        if not match:
            c.append(a[a_index])
    if b_oldindex < len(b):
        c.extend(b[b_oldindex:])
    return c

print(merge(a1,b1))
print(merge(a2,b2))
print(merge(a3,b3))
print(merge(b1,a1))
print(merge(b2,a2))
print(merge(b3,a3))

Which produces the following output:

['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Third', 'Second', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']

In all of test cases, the only one that fails to produce the correct order is merge(a3,b3) .

Solving the problem completely may involve implementing a correct merge algorithm (as used in merge sort ), which requires the ability to evaluate the order that elements should be in. You can see a python implementation of merge sort at Rosetta code.

UPDATE:

Given that this is actually to sort the installments in a set of books, you can avoid situations you described in your third set of data by taking additional information into account. Namely, use the merge function on lists in the reverse order of copyright or publication date.

For example, in your case:

a3 = ['First', 'Third', 'Fourth']  # Second novel
b3 = ['First', 'Second', 'Fourth'] # Third novel

a3 's book would have been published before b3 's book. If you can harvest that kind of metadata, then you could avoid this issue.

Copyright date won't differ between different editions of the same book, but publication date might. Therefore, I'd look at copyright date before publication date.

The set container is defined by having no duplicates in it. You can make a set of both of the lists and then cast it back to list type:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']
c= list(set(a+b))
['Second', 'Fourth', 'Third', 'First']
#Note that set will not organize anything, it will just delete the duplicates

I had the same issue, and I have an answer. I found this post because I was searching for more pythonic ways of doing it.

First, a note about the special case:

a=['A','C','D','E']
b=['A','B','D','F']
c=joinListsOrdered(a,b)

in my case I do not have any problem: ['A','B','C','D','E','F'] is as good as ['A','C','B','D','F','E'] . The only validation condition I want is: the order of elements in c respects the order in a and b separately, ie [el for el in c if el in a] is element-wise equal to a (and equivalently to b ). I also think this is the only reasonable stance on this problem without further information about the problem.

This translate in saying: the focus is about the common elements ( ['A', 'D'] ). If those are in the proper order, everything else, can be easily stuck in the middle. Therefore, this algorithm:

def joinListsOrdered(a,b):
    # Find ORDERED common elements
    order={}
    for i, e in enumerate(a):
        order[e]=i
    commonElements=sorted(set(a) & set(b), key=lambda i: order[i])
    # Cycle on each common element.
    i=0 #index of a
    j=0 #index of b
    c=[]
    for comEl in commonElements:
       while not a[i]==comEl:
           c.append(a[i])
           i=i+1
       while not b[j]==comEl:
           c.append(b[j])
           j=j+1
       c.append(comEl)
       i=i+1;j=j+1
    # Add the eventual residuals after the last common element.
    c=c+a[i:]+b[j:]
    return c

Of course it fails to respect the validation condition if the order in a and b for some common element is different, but in that case the problem does not have a solution.

In the most simple where there is only one element that is different and it's in the same position just a iterate joinly though both strings

newlist = []
for i in range(len(a)):
  if a[i] == b[i]:
    newlist.append(a)
  else:
    newlist.append(a)
    newlist.append(b)

If your lists are more complicate turn one of them into a dictionary first and check against the other when merging.

Use Python's bisect library.

from bisect import insort

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Fourth']
for entry in b:
    insort(entry, a)

unique = Set(a)
print unique

Note: obviously, the Strings won't compare in order properly, you'll probably want to use a dictionary for that!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM