简体   繁体   English

合并两个字符串列表

[英]Combine two lists of strings

Given two lists of strings that contain duplicates save for one element in each list, how would you combine the two into a single list that contains one copy of every value in list order? 给定两个包含重复项的字符串列表,每个列表中都保存一个元素,您如何将二者组合成一个列表,每个列表按列表顺序包含每个值的一个副本?

For example, given the following two lists in Python: 例如,给定以下两个Python列表:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

Or 要么

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']

How would you combine the two lists to get a single list like this: 您如何将两个列表结合起来得到一个列表,如下所示:

result = ['First', 'Second', 'Third', 'Fourth']

Note that the exact values of the strings cannot necessarily be trusted to help with ordering the elements. 请注意,不一定必须信任字符串的确切值才能帮助元素排序。

I am aware of the possibility that there will be some cases with no definitive way to lock the list down to a particular order, and will probably have to special-case those, but for the general cases I'd rather have a procedure to follow. 我知道有些情况下可能没有确定的方法将列表锁定为特定顺序,并且可能需要对它们进行特殊处理,但是对于一般情况,我宁愿遵循以下程序。 For example: 例如:

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Fourth']

This could have 'Third' and 'Second' in either order, as there's no item on both lists between them to provide a guideline. 可以同时使用'Third''Second' ,因为它们之间的两个列表中都没有提供指导的项目。

Edit: I should explain the strings a bit further, as I see many of you are assuming that I can merely sort a raw merge of the two lists, and this just isn't going to work. 编辑:我应该进一步解释字符串,因为我看到你们中的许多人都假设我只能对两个列表进行原始合并排序,而这行不通。

I'm taking story titles, which, for each story, only list the other instalments and not the linked story itself. 我正在使用故事标题,对于每个故事,它们仅列出其他部分,而不列出链接的故事本身。 So by taking two lists (or possibly more, I'm not sure), I can come up with a full list of the instalments to put them in their proper order. 因此,通过列出两个列表(我不确定,可能还有更多),我可以拿出完整的分期付款清单,以按适当顺序排列它们。

Simple algorythm: 简单算法:

  1. Concat lists Concat列表
  2. Remove dups 删除公仔
  3. Sort 分类

Code: 码:

def order_list(lst, order_dict):
     return sorted(list(lst), key = lambda x: order_dict.get(x, -1))

c = list(set(a + b))
ord_dict = {"First": 1, "Second": 2, "Third": 3, "Fourth": 4}
order_list(c, ord_dict)

You have 2 different concerns here: 您在这里有2个不同的问题:

  • Duplicate elimination 重复消除
  • Ordering 定购

I would do them separately. 我会分开做。 Duplication elimination is simple enough. 消除重复非常简单。 Use a set : 使用一set

>>> a = ['Second', 'Third', 'Fourth']
>>> b = ['First', 'Second', 'Third']
>>> x = set(a)
>>> x
set(['Second', 'Fourth', 'Third'])
>>> x.update(b)
>>> x
set(['Second', 'Fourth', 'Third', 'First'])

Then you'll need to a define the ordering somehow. 然后,您需要以某种方式定义顺序。 The simplest way to do that might be to map each possible element to a value: 最简单的方法是将每个可能的元素映射到一个值:

>>> order_dict = {'First': 1, 'Second': 2, 'Third': 3, 'Fourth': 4}
>>> result = sorted(list(x), key=lambda i: order_dict[i])
>>> result
['First', 'Second', 'Third', 'Fourth']

Alternatively, you could use some kind of compare function with sorted 's cmp argument if you can define one for your values. 另外,如果可以为值定义一个比较功能,则可以对sortedcmp参数使用某种比较功能。

Hope this helps. 希望这可以帮助。

If we assume that your two lists are both ordered, and that they are each missing only some elements from the full set, then I can kind of see an algorithm that should work most of the time . 如果我们假设您的两个列表都是有序的,并且它们每个都只缺少整个集合中的某些元素,那么我可以看到一种算法在大多数情况下都可以使用

  1. Take the next index in A. 取A中的下一个索引。
  2. Step through B looking for a match: 逐步通过B寻找匹配项:
    1. If there was a match: 如果有匹配项:
      • Remove everything from the start of B up to and including the match in B, and add to C 从B的开头直到B中的所有匹配项都删除,然后添加到C中
    2. If there was no match: 如果没有匹配项:
      • Add index A to C 将索引A添加到C
  3. Repeat 重复
  4. If there's anything left in B, add it to C. 如果B中还有剩余内容,请将其添加到C中。

This is the python code for the algorithm: 这是该算法的python代码:

a1 = ['Second', 'Third', 'Fourth']
b1 = ['First', 'Second', 'Third']

a2 = ['First', 'Third', 'Fourth']
b2 = ['First', 'Second', 'Third']

a3 = ['First', 'Third', 'Fourth']
b3 = ['First', 'Second', 'Fourth']

def merge(a, b):
    c = []
    b_oldindex = 0
    for a_index in range(len(a)):
        match = False
        for b_index in range(b_oldindex, len(b)):
            if a[a_index] == b[b_index]:
                c.extend(b[b_oldindex:b_index+1])
                b_oldindex = b_index + 1
                match = True
                break
        if not match:
            c.append(a[a_index])
    if b_oldindex < len(b):
        c.extend(b[b_oldindex:])
    return c

print(merge(a1,b1))
print(merge(a2,b2))
print(merge(a3,b3))
print(merge(b1,a1))
print(merge(b2,a2))
print(merge(b3,a3))

Which produces the following output: 产生以下输出:

['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Third', 'Second', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']
['First', 'Second', 'Third', 'Fourth']

In all of test cases, the only one that fails to produce the correct order is merge(a3,b3) . 在所有测试用例中,唯一无法产生正确顺序的测试用例是merge(a3,b3)

Solving the problem completely may involve implementing a correct merge algorithm (as used in merge sort ), which requires the ability to evaluate the order that elements should be in. You can see a python implementation of merge sort at Rosetta code. 彻底解决问题可能涉及实现正确的合并算法 (如merge sort中所用 ),该算法需要能够评估元素应处于的顺序。您可以在Rosetta代码上看到python的实现

UPDATE: 更新:

Given that this is actually to sort the installments in a set of books, you can avoid situations you described in your third set of data by taking additional information into account. 鉴于这实际上是对一组书籍中的分期付款进行排序,因此可以通过考虑其他信息来避免在第三组数据中描述的情况。 Namely, use the merge function on lists in the reverse order of copyright or publication date. 即,以与版权或发布日期相反的顺序在列表上使用merge功能。

For example, in your case: 例如,在您的情况下:

a3 = ['First', 'Third', 'Fourth']  # Second novel
b3 = ['First', 'Second', 'Fourth'] # Third novel

a3 's book would have been published before b3 's book. a3的书本应该早于b3的书出版。 If you can harvest that kind of metadata, then you could avoid this issue. 如果可以收集这种元数据,则可以避免此问题。

Copyright date won't differ between different editions of the same book, but publication date might. 同一本书的不同版本之间的版权日期不会有所不同,但出版日期可能会有所不同。 Therefore, I'd look at copyright date before publication date. 因此,我将在出版日期之前查看版权日期。

The set container is defined by having no duplicates in it. set容器是通过其中没有重复项来定义的。 You can make a set of both of the lists and then cast it back to list type: 您可以同时创建两个列表,然后将其强制转换回列表类型:

a = ['Second', 'Third', 'Fourth']
b = ['First', 'Second', 'Third']
c= list(set(a+b))
['Second', 'Fourth', 'Third', 'First']
#Note that set will not organize anything, it will just delete the duplicates

I had the same issue, and I have an answer. 我有同样的问题,我也有答案。 I found this post because I was searching for more pythonic ways of doing it. 我找到了这篇文章,是因为我正在寻找更多的pythonic方法。

First, a note about the special case: 首先,有关特殊情况的注释:

a=['A','C','D','E']
b=['A','B','D','F']
c=joinListsOrdered(a,b)

in my case I do not have any problem: ['A','B','C','D','E','F'] is as good as ['A','C','B','D','F','E'] . 就我而言,我没有任何问题: ['A','B','C','D','E','F']['A','C','B','D','F','E'] The only validation condition I want is: the order of elements in c respects the order in a and b separately, ie [el for el in c if el in a] is element-wise equal to a (and equivalently to b ). 我想要的唯一验证条件是: c中元素的顺序分别遵守ab的顺序,即[el for el in c if el in a]在元素方面等于a (并等效于b )。 I also think this is the only reasonable stance on this problem without further information about the problem. 我也认为,这是对此问题的唯一合理立场,而无需进一步了解此问题。

This translate in saying: the focus is about the common elements ( ['A', 'D'] ). 这就是说:重点是公共元素( ['A', 'D'] )。 If those are in the proper order, everything else, can be easily stuck in the middle. 如果这些顺序正确,则其他所有内容很容易卡在中间。 Therefore, this algorithm: 因此,此算法:

def joinListsOrdered(a,b):
    # Find ORDERED common elements
    order={}
    for i, e in enumerate(a):
        order[e]=i
    commonElements=sorted(set(a) & set(b), key=lambda i: order[i])
    # Cycle on each common element.
    i=0 #index of a
    j=0 #index of b
    c=[]
    for comEl in commonElements:
       while not a[i]==comEl:
           c.append(a[i])
           i=i+1
       while not b[j]==comEl:
           c.append(b[j])
           j=j+1
       c.append(comEl)
       i=i+1;j=j+1
    # Add the eventual residuals after the last common element.
    c=c+a[i:]+b[j:]
    return c

Of course it fails to respect the validation condition if the order in a and b for some common element is different, but in that case the problem does not have a solution. 当然,如果某个公共元素的ab的顺序不同,它就不会遵守验证条件,但是在那种情况下,问题没有解决的办法。

In the most simple where there is only one element that is different and it's in the same position just a iterate joinly though both strings 在最简单的情况下,只有一个不同的元素并且它处于相同的位置,只是通过两个字符串进行迭代连接

newlist = []
for i in range(len(a)):
  if a[i] == b[i]:
    newlist.append(a)
  else:
    newlist.append(a)
    newlist.append(b)

If your lists are more complicate turn one of them into a dictionary first and check against the other when merging. 如果您的列表比较复杂,请先将其中一个列表变成字典,然后在合并时对照另一个列表。

Use Python's bisect library. 使用Python的bisect库。

from bisect import insort

a = ['First', 'Third', 'Fourth']
b = ['First', 'Second', 'Fourth']
for entry in b:
    insort(entry, a)

unique = Set(a)
print unique

Note: obviously, the Strings won't compare in order properly, you'll probably want to use a dictionary for that! 注意:很明显,字符串无法正确地进行比较,您可能需要为此使用字典!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM