繁体   English   中英

Python:对依赖项列表进行排序

[英]Python: sorting a dependency list

我正在尝试使用内置的sorted()函数解决我的问题,或者如果我需要自己做-使用cmp的老派相对容易。

我的数据集如下所示:

x = [
('business', Set('fleet','address'))
('device', Set('business','model','status','pack'))
('txn', Set('device','business','operator'))
....

排序规则基本上应该适用于N和Y的所有值,其中Y> N,x [N] [0]不在x [Y] [1]中

尽管我正在使用cmp参数仍然可用的Python 2.6,但我正在尝试使此Python 3安全。

那么,可以使用一些lambda魔术和key参数来完成此操作吗?

-==更新==-

感谢Eli&Winston! 我真的不认为使用钥匙会行得通,或者如果我怀疑这会是不理想的鞋拔解决方案。

因为我的问题是关于数据库表的依赖关系,所以我不得不对Eli的代码进行少量补充,以从其依赖关系列表中删除一项(在一个精心设计的数据库中,这不会发生,但是谁住在这个神奇的完美世界中?)

我的解决方案:

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, set(names of dependancies))`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source]        
    emitted = []
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
            if deps:
                next_pending.append(entry)
            else:
                yield name
                emitted.append(name) # <-- not required, but preserves original order
                next_emitted.append(name)
        if not next_emitted:
            raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
        pending = next_pending
        emitted = next_emitted

您想要的就是所谓的拓扑排序 虽然可以使用内置的sort()来实现,但是这很尴尬,最好直接在python中实现拓扑排序。

为什么会很尴尬? 如果您在Wiki页面上研究这两种算法,它们都依赖于运行中的“标记节点”集,因为key=xxx (甚至cmp=xxx ),很难将这种概念扭曲为sort()形式。效果最好的无状态比较功能,特别是因为timsort不保证该元素将被检查的顺序。我(很)确保其使用任何解决方案sort()将要结束了冗余计算每个呼叫的一些信息到key / cmp函数,以解决无状态问题。

以下是我一直在使用的算法(对一些JavaScript库依赖关系进行排序):

编辑:基于Winston Ewert的解决方案对此做了很大的修改

def topological_sort(source):
    """perform topo sort on elements.

    :arg source: list of ``(name, [list of dependancies])`` pairs
    :returns: list of names, with dependancies listed first
    """
    pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place       
    emitted = []        
    while pending:
        next_pending = []
        next_emitted = []
        for entry in pending:
            name, deps = entry
            deps.difference_update(emitted) # remove deps we emitted last pass
            if deps: # still has deps? recheck during next pass
                next_pending.append(entry) 
            else: # no more deps? time to emit
                yield name 
                emitted.append(name) # <-- not required, but helps preserve original ordering
                next_emitted.append(name) # remember what we emitted for difference_update() in next pass
        if not next_emitted: # all entries have unmet deps, one of two things is wrong...
            raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
        pending = next_pending
        emitted = next_emitted

旁注:有可能鞋拔一个cmp()函数转换成key=xxx ,如在本蟒错误跟踪概述消息

我做这样的拓扑排序:

def topological_sort(items):
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if dependencies.issubset(provided):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

我认为它比Eli的版本更直接,我不知道效率。

看起来格式不正确以及这种奇怪的Set类型...(我将它们保留为元组并正确分隔列表项...)...并使用networkx库使事情变得方便...

x = [
    ('business', ('fleet','address')),
    ('device', ('business','model','status','pack')),
    ('txn', ('device','business','operator'))
]

import networkx as nx

g = nx.DiGraph()
for key, vals in x:
    for val in vals:
        g.add_edge(key, val)

print nx.topological_sort(g)

这是Winston的建议,它带有一个文档字符串和一个细微的调整,可以将Provides.issuperset provided.issuperset(dependencies) dependencies.issubset(provided) provided.issuperset(dependencies) 所做的更改使您可以将每个输入对中的dependencies作为任意可迭代的方式传递,而不一定是一个set

我的用例涉及一个dict其键是项目字符串,每个键的值是该键所依赖的项目名称的list 一旦确定dict是非空的,就可以将其iteritems()传递给修改后的算法。

再次感谢温斯顿。

def topological_sort(items):
    """
    'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
    is an iterable of the same type as 'items'.

    If 'items' is a generator rather than a data structure, it should not be
    empty. Passing an empty generator for 'items' (zero yields before return)
    will cause topological_sort() to raise TopologicalSortFailure.

    An empty iterable (e.g. list, tuple, set, ...) produces no items but
    raises no exception.
    """
    provided = set()
    while items:
         remaining_items = []
         emitted = False

         for item, dependencies in items:
             if provided.issuperset(dependencies):
                   yield item
                   provided.add(item)
                   emitted = True
             else:
                   remaining_items.append( (item, dependencies) )

         if not emitted:
             raise TopologicalSortFailure()

         items = remaining_items

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM