合并唯一元素的序列

Question

我正在尝试合并多个序列，如下例所示：

x = ['one', 'two', 'four']
y = ['two', 'three', 'five']
z = ['one', 'three', 'four']

merged = ['one', 'two', 'three', 'four', 'five']

给定的序列都是相同的无重复序列（未给出）的所有子序列。 如果顺序无法确定——如示例中的'four'和'five' ，它们也可以颠倒——任何一种解决方案都可以。

这个问题类似于多序列比对，但我怀疑有一个（算法上）更简单的解决方案，因为它受到更多限制（没有重复，没有交叉边缘）。 例如。 当从所有元素的并集开始时，我只需要对元素进行排序——但我似乎无法找到一种体面的方法来从输入序列中推断出底层顺序。

该示例是在 Python 中编写的，所需的解决方案也是如此，但该问题具有一般算法性质。

Answer 1

这是一种非常低效的方法，应该可以满足您的需求：

w = ['zero', 'one']
x = ['one', 'two', 'four']
y = ['two', 'three', 'five']
z = ['one', 'three', 'four']

def get_score(m, k):
    v = m[k]
    return sum(get_score(m, kk) for kk in v) + 1

m = {}
for lst in [w,x,y,z]:
    for (i,src) in enumerate(lst):
        if src not in m: m[src] = []
        for (j,dst) in enumerate(lst[i+1:]):
            m[src].append(dst)

scored_u = [(k,get_score(m,k)) for k in m]
scored_s = sorted(scored_u, key=lambda (k,s): s, reverse=True)

for (k,s) in scored_s:
    print(k,s)

输出：

('zero', 13)
('one', 12)
('two', 6)
('three', 3)
('four', 1)
('five', 1)

该方法首先构建一个映射m ，其中键是列表的术语，值是发现跟在键后面的术语列表。

所以在这种情况下， m看起来像：

{
  'three': ['five', 'four'], 
  'two':   ['four', 'three', 'five'], 
  'four':  [], 
  'zero':  ['one'], 
  'five':  [], 
  'one':   ['two', 'four', 'three', 'four']
}

从那里，它计算每个键的分数。 分数定义为已看到的元素的分数之和加上 1。

所以

get_score(m, 'four') = 1
get_score(m, 'five') = 1
# and thus
get_score(m, 'three') = 3  # (1(four) + 1(five) + 1)

它对输入列表中的每个元素（在我的例子中为w,x,y,z ）执行此操作并计算总分，然后按分数降序对其进行排序。

我说这是低效的，因为这个get_score可以被记忆，所以你只需要确定一个键的分数。 您可能会通过回溯来做到这一点——计算值为空列表的键的分数，然后向后工作。 在当前的实现中，它多次确定某些键的分数。

注意：所有这些保证是元素的分数不会低于“预期”的分数。 例如，添加

v = ['one-point-five', 'four']

混合将在列表中放置four one-point-five以上，但由于您只引用一次，在v ，没有足够的上下文来做得更好。

Answer 2

为了完整起见，这就是我最终解决问题的方式：

正如@DSM 所指出的，这个问题与拓扑排序有关。 那里有第三方模块，例如。 toposort （纯 Python，无依赖）。

序列需要转换为映射格式，类似于其他答案中也使用/建议的格式。 toposort_flatten()然后做剩下的toposort_flatten() ：

from collections import defaultdict
from toposort import toposort_flatten

def merge_seqs(*seqs):
    '''Merge sequences that share a hidden order.'''
    order_map = defaultdict(set)
    for s in seqs:
        for i, elem in enumerate(s):
            order_map[elem].update(s[:i])
    return toposort_flatten(dict(order_map))

用上面的例子：

>>> w = ['zero', 'one']
>>> x = ['one', 'two', 'four']
>>> y = ['two', 'three', 'five']
>>> z = ['one', 'three', 'four']
>>> merge_seqs(w, x, y, z)
['zero', 'one', 'two', 'three', 'five', 'four']

Answer 3

您的问题完全与离散数学中的关系有关，即数组中的所有组合对都具有传递关系，这意味着if a>b and b>c then a>c 。 因此，您可以创建以下列表，因此在长度为 5 的集合中，最小元素应该在这些对中的 4 对中——如果我们有这样数量的对。 所以首先我们需要创建这些按第一个元素分组的对，为此我们可以使用itertools模块中的groupby和chain函数：

>>> from itertools import combinations,chain,groupby
>>> from operator import itemgetter

>>> l1= [list(g) for _,g in groupby(sorted(chain.from_iterable(combinations(i,2) for i in [x,y,z])),key=itemgetter(0))]
[[('one', 'four'), ('one', 'four'), ('one', 'three'), ('one', 'two')], [('three', 'five'), ('three', 'four')], [('two', 'five'), ('two', 'four'), ('two', 'three')]]

因此，如果我们有 len 4 ,3 ,2, 1 的组，那么我们已经找到了答案，但是如果我们没有找到这样的序列，我们可以反向进行前面的计算，以这种逻辑找到我们的元素，如果我们找到关系len 4 的组是最大的数字，...！

>>> l2= [list(g) for _,g in groupby(sorted(chain.from_iterable(combinations(i,2) for i in [x,y,z]),key=itemgetter(1)),key=itemgetter(1))]
    [[('two', 'five'), ('three', 'five')], [('one', 'four'), ('two', 'four'), ('one', 'four'), ('three', 'four')], [('two', 'three'), ('one', 'three')], [('one', 'two')]]

所以我们可以做到以下几点：

请注意，我们需要使用set(zip(*i)[1])来获取与我们的特定元素相关的元素集，然后使用len来计算这些元素的数量。

>>> [(i[0][0],len(set(zip(*i)[1]))) for i in l1]
[('one', 3), ('three', 2), ('two', 3)]
>>> [(i[0][1],len(set(zip(*i)[0]))) for i in l2]
[('five', 2), ('four', 3), ('three', 2), ('two', 1)]

在第一部分我们找到了 4,2,3 所以现在我们只需要找到它可能是four or five 1。现在我们去第二部分，我们需要找到一个长度为4 or 3的序列，即four是3 所以已经找到了第 4 个元素，因此第 5 个元素应该是five 。

编辑：作为一种更优雅、更快的方式，您可以使用collections.defaultdict完成这项工作：

>>> from collections import defaultdict
>>> d=defaultdict(set)
>>> for i,j in chain.from_iterable(combinations(i,2) for i in [x,y,z]) :
...          d[i].add(j)
... 
>>> d
defaultdict(<type 'set'>, {'three': set(['four', 'five']), 'two': set(['four', 'five', 'three']), 'one': set(['four', 'two', 'three'])})
>>> l1=[(k,len(v)) for k,v in d.items()]
>>> l1
[('three', 2), ('two', 3), ('one', 3)]
>>> d=defaultdict(set)
>>> for i,j in chain.from_iterable(combinations(i,2) for i in [x,y,z]) :
...          d[j].add(i) #create dict reversely 
... 
>>> l2=[(k,len(v)) for k,v in d.items()]
>>> l2
[('four', 3), ('five', 2), ('two', 1), ('three', 2)]

合并唯一元素的序列

问题描述

3 个解决方案

解决方案1
2 已采纳 2015-04-04 21:59:39

解决方案2
1 2015-04-05 22:19:26

解决方案3
0 2015-04-04 22:19:59

合并唯一元素的序列

问题描述

3 个解决方案

解决方案1 2 已采纳 2015-04-04 21:59:39

解决方案2 1 2015-04-05 22:19:26

解决方案3 0 2015-04-04 22:19:59

解决方案1
2 已采纳 2015-04-04 21:59:39

解决方案2
1 2015-04-05 22:19:26

解决方案3
0 2015-04-04 22:19:59