[英]Python: sorting a dependency list
我正在尝试使用内置的sorted()函数解决我的问题,或者如果我需要自己做-使用cmp的老派相对容易。
我的数据集如下所示:
x = [ ('business', Set('fleet','address')) ('device', Set('business','model','status','pack')) ('txn', Set('device','business','operator')) ....
排序规则基本上应该适用于N和Y的所有值,其中Y> N,x [N] [0]不在x [Y] [1]中
尽管我正在使用cmp参数仍然可用的Python 2.6,但我正在尝试使此Python 3安全。
那么,可以使用一些lambda魔术和key参数来完成此操作吗?
-==更新==-
感谢Eli&Winston! 我真的不认为使用钥匙会行得通,或者如果我怀疑这会是不理想的鞋拔解决方案。
因为我的问题是关于数据库表的依赖关系,所以我不得不对Eli的代码进行少量补充,以从其依赖关系列表中删除一项(在一个精心设计的数据库中,这不会发生,但是谁住在这个神奇的完美世界中?)
我的解决方案:
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, set(names of dependancies))`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source]
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(set((name,)), emitted) # <-- pop self from dep, req Py2.6
if deps:
next_pending.append(entry)
else:
yield name
emitted.append(name) # <-- not required, but preserves original order
next_emitted.append(name)
if not next_emitted:
raise ValueError("cyclic dependancy detected: %s %r" % (name, (next_pending,)))
pending = next_pending
emitted = next_emitted
您想要的就是所谓的拓扑排序 。 虽然可以使用内置的sort()
来实现,但是这很尴尬,最好直接在python中实现拓扑排序。
为什么会很尴尬? 如果您在Wiki页面上研究这两种算法,它们都依赖于运行中的“标记节点”集,因为key=xxx
(甚至cmp=xxx
),很难将这种概念扭曲为sort()
形式。效果最好的无状态比较功能,特别是因为timsort不保证该元素将被检查的顺序。我(很)确保其不使用任何解决方案sort()
将要结束了冗余计算每个呼叫的一些信息到key / cmp函数,以解决无状态问题。
以下是我一直在使用的算法(对一些JavaScript库依赖关系进行排序):
编辑:基于Winston Ewert的解决方案对此做了很大的修改
def topological_sort(source):
"""perform topo sort on elements.
:arg source: list of ``(name, [list of dependancies])`` pairs
:returns: list of names, with dependancies listed first
"""
pending = [(name, set(deps)) for name, deps in source] # copy deps so we can modify set in-place
emitted = []
while pending:
next_pending = []
next_emitted = []
for entry in pending:
name, deps = entry
deps.difference_update(emitted) # remove deps we emitted last pass
if deps: # still has deps? recheck during next pass
next_pending.append(entry)
else: # no more deps? time to emit
yield name
emitted.append(name) # <-- not required, but helps preserve original ordering
next_emitted.append(name) # remember what we emitted for difference_update() in next pass
if not next_emitted: # all entries have unmet deps, one of two things is wrong...
raise ValueError("cyclic or missing dependancy detected: %r" % (next_pending,))
pending = next_pending
emitted = next_emitted
旁注:有可能鞋拔一个cmp()
函数转换成key=xxx
,如在本蟒错误跟踪概述消息 。
我做这样的拓扑排序:
def topological_sort(items):
provided = set()
while items:
remaining_items = []
emitted = False
for item, dependencies in items:
if dependencies.issubset(provided):
yield item
provided.add(item)
emitted = True
else:
remaining_items.append( (item, dependencies) )
if not emitted:
raise TopologicalSortFailure()
items = remaining_items
我认为它比Eli的版本更直接,我不知道效率。
看起来格式不正确以及这种奇怪的Set
类型...(我将它们保留为元组并正确分隔列表项...)...并使用networkx
库使事情变得方便...
x = [
('business', ('fleet','address')),
('device', ('business','model','status','pack')),
('txn', ('device','business','operator'))
]
import networkx as nx
g = nx.DiGraph()
for key, vals in x:
for val in vals:
g.add_edge(key, val)
print nx.topological_sort(g)
这是Winston的建议,它带有一个文档字符串和一个细微的调整,可以将Provides.issuperset provided.issuperset(dependencies)
dependencies.issubset(provided)
provided.issuperset(dependencies)
。 所做的更改使您可以将每个输入对中的dependencies
作为任意可迭代的方式传递,而不一定是一个set
。
我的用例涉及一个dict
其键是项目字符串,每个键的值是该键所依赖的项目名称的list
。 一旦确定dict
是非空的,就可以将其iteritems()
传递给修改后的算法。
再次感谢温斯顿。
def topological_sort(items):
"""
'items' is an iterable of (item, dependencies) pairs, where 'dependencies'
is an iterable of the same type as 'items'.
If 'items' is a generator rather than a data structure, it should not be
empty. Passing an empty generator for 'items' (zero yields before return)
will cause topological_sort() to raise TopologicalSortFailure.
An empty iterable (e.g. list, tuple, set, ...) produces no items but
raises no exception.
"""
provided = set()
while items:
remaining_items = []
emitted = False
for item, dependencies in items:
if provided.issuperset(dependencies):
yield item
provided.add(item)
emitted = True
else:
remaining_items.append( (item, dependencies) )
if not emitted:
raise TopologicalSortFailure()
items = remaining_items
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.