Python对列表项的唯一列表进行排序

Question

I can't seem to find a question on SO about my particular problem, so forgive me if this has been asked before! 我似乎找不到关于我的特定问题的问题，所以请原谅我以前的问题！

Anyway, I'm writing a script to loop through a set of URL's and give me a list of unique urls with unique parameters. 无论如何，我正在编写一个脚本来遍历一组URL，并为我提供具有唯一参数的唯一URL列表。

The trouble I'm having is actually comparing the parameters to eliminate multiple duplicates. 我遇到的麻烦实际上是比较参数以消除多个重复项。 It's a bit hard to explain, so some examples are probably in order: 有点难以解释，因此可能有一些示例：

Say I have a list of URL's like this 说我有一个这样的URL列表

hxxp://www.somesite.com/page.php?id=3&title=derp hxxp：//www.somesite.com/page.php ID = 3＆标题= DERP
hxxp://www.somesite.com/page.php?id=4&title=blah hxxp：//www.somesite.com/page.php ID = 4＆标题=胡说
hxxp://www.somesite.com/page.php?id=3&c=32&title=thing hxxp：//www.somesite.com/page.php ID = 3＆amp; C = 32＆标题=事
hxxp://www.somesite.com/page.php?b=33&id=3 hxxp：//www.somesite.com/page.php B = 33＆ID = 3

I have it parsing each URL into a list of lists, so eventually I have a list like this: 我将每个URL解析为一个列表列表，所以最终我有了一个这样的列表：

sort = [['id', 'title'], ['id', 'c', 'title'], ['b', 'id']]

I nee to figure out a way to give me just 2 lists in my list at that point: 我需要找出一种方法，以便在此时只给我2个列表：

new = [['id', 'c', 'title'], ['b', 'id']]

As of right now I've got a bit to sort it out a little, I know I'm close and I've been slamming my head against this for a couple days now :(. Any ideas? 截至目前，我还需要进行一些整理，我知道我已经接近了，而现在我已经将头撞了两天:(。有什么想法吗？

Thanks in advance! 提前致谢！ :) :)

EDIT: Sorry for not being clear! 编辑：对不起，不清楚！ This script is aimed at finding unique entry points for web applications post-spidering. 该脚本旨在为蜘蛛后的Web应用程序找到唯一的入口点。 Basically if a URL has 3 unique entry points 基本上，如果一个URL有3个唯一的入口点

['id', 'c', 'title']

I'd prefer that to the same link with 2 unique entry points, such as: 我希望该链接具有2个唯一的入口点，例如：

['id', 'title']

So I need my new list of lists to eliminate the one with 2 and prefer the one with 3 ONLY if the smaller variables are in the larger set. 因此，我需要新的列表列表，以消除带有2的列表，而仅当较小的变量位于较大的集合中时才喜欢带有3的列表。 If it's still unclear let me know, and thank you for the quick responses! 如果仍然不清楚，请告诉我，谢谢您的迅速答复！ :) :)

Answer 1

I'll assume that subsets are considered "duplicates" (non-commutatively, of course)... 我假设子集被认为是“重复项”（当然是非可交换的）...

Start by converting each query into a set and ordering them all from largest to smallest. 首先将每个查询转换成一个集合，然后将它们从最大到最小进行排序。 Then add each query to a new list if it isn't a subset of an already-added query. 如果不是已添加查询的子集，则将每个查询添加到新列表中。 Since any set is a subset of itself, this logic covers exact duplicates: 由于任何集合都是其自身的子集，因此此逻辑涵盖了精确的重复项：

a = []
for q in sorted((set(q) for q in sort), key=len, reverse=True):
    if not any(q.issubset(Q) for Q in a):
        a.append(q)
a = [list(q) for q in a] # Back to lists, if you want

Python对列表项的唯一列表进行排序

问题描述

1 个解决方案

解决方案1
5 已采纳 2011-09-23 23:58:29

Python对列表项的唯一列表进行排序

问题描述

1 个解决方案

解决方案1 5 已采纳 2011-09-23 23:58:29

解决方案1
5 已采纳 2011-09-23 23:58:29