简体   繁体   English

如何从 Python 中的列表中删除不可散列的重复项?

[英]How to remove unhashable duplicates from a list in Python?

My data is this:我的数据是这样的:

[{u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': [u'/www/web'], u'server_port': u'80'}, {u'webpath': [u'/www/web'], u'server_port': u'80'}, {u'webpath': [u'/www/shanghu'], u'server_port': u'80'}, {u'webpath': [u'/www/shanghu'], u'server_port': u'80'}, {u'webpath': [u'/www/www/html/falv'], u'server_port': u'80'}, {u'webpath': [u'/www/www/html/falv'], u'server_port': u'80'}, {u'webpath': [u'/www/www/html/falv'], u'server_port': u'80'}, {u'webpath': [u'/www/falvhezi'], u'server_port': u'80'}, {u'webpath': [u'/www/test10'], u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': u'/etc/html', u'server_port': u'80'}, {u'webpath': [u'/www/400.ask.com'], u'server_port': u'80'}, {u'webpath': [u'/www/www'], u'server_port': u'80'}, {u'webpath': [u'/www/www'], u'server_port': u'80'}, {u'webpath': [u'/www/www'], u'server_port': u'80'}, {u'webpath': [u'/www/zhuanti'], u'server_port': u'80'}, {u'webpath': [u'/www/zhuanti'], u'server_port': u'80'}, {u'webpath': [u'/www/shanghu'], u'server_port': u'80'}]

My code is this:我的代码是这样的:

    seen = set()
    new_webpath_list = []
    for webpath in nginxConfs:
        t = tuple(webpath.items())
        if t not in seen:
            seen.add(t)
            new_webpath_list.append(webpath)

But the script returns:但脚本返回:

TypeError: "unhashable type: 'list'"

You are creating tuples from the dictionaries to make them hashable, but there can still be non-hashable lists inside those tuples!您正在创建从字典元组,使他们可哈希的,但仍有可能是那些元组内的非可哈希表! Instead, you also have to "tuplefy" the values.相反,您还必须“元组化”这些值。

t = tuple(((k, tuple(v)) for (k, v) in webpath.items()))

Note that this is a bit glitchy as the first entry in the dict is just a string, while the others are lists of strings.请注意,这有点小问题,因为 dict 中的第一个条目只是一个字符串,而其他条目是字符串列表。 You could mend this with an if/else , but it should not really be necessary.您可以使用if/else修复此问题,但这不是必需的。

t = tuple(((k, tuple(v) if isinstance(v, list) else v) for (k, v) in webpath.items()))

Alternatively, you could also just memorize the string represenations of the dictionaries...或者,您也可以只记住字典的字符串表示...

t = repr(webpath)

The most straightforward way to do this is to just test membership directly using the new list you are building.最直接的方法是直接使用您正在构建的新列表测试成员资格。

new_webpath_list = []
for webpath in nginxConfs:
    if webpath not in new_webpath_list:
        new_webpath_list.append(webpath)

This handles the cases where there is an arbitrary (unknown beforehand) level of nesting of unhashable types.这可以处理存在任意(事先未知)级别的不可散列类型嵌套的情况。 It also makes your code simpler, easier to understand, and very possibly more efficient, because you are not creating extra data that you don't need (no seen set, no conversion of elements to tuples).它还使您的代码更简单、更容易理解,并且很可能更高效,因为您不会创建不需要的额外数据(没有seen集合,没有将元素转换为元组)。

Late answer, but I was able to remove duplicated dict from a list using:迟到的答案,但我能够使用以下方法从list删除重复的dict

old_list = [{"x": 1}, {"x": 1}, {"x": 2}]
new_list = []
[new_list.append(x) for x in old_list if x not in new_list]
# [{'x': 1}, {'x': 2}]

Demo演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM