[英]case insensitive for sets in python
I have a list that is generated from multiple lists. 我有一个由多个列表生成的列表。 This combined list contains names that are generated by end users.
该组合列表包含最终用户生成的名称。 Therefore contain similar names, but with different upper/lower case characters.
因此,包含相似的名称,但具有不同的大写/小写字符。 I want to filter out the names that contain same characters and just keep the first found in the original list.
我想过滤出包含相同字符的名称,并将第一个保留在原始列表中。
As an example I have the following list: 例如,我有以下列表:
L0 = ['A_B Cdef', 'A_B Cdef', 'A_B Cdef', 'A_B CdEF', 'A_B CDEF','a_B CdEF', 'A_b CDEF', 'GG_ooo', 'a1-23456']
if I run: 如果我运行:
L1 = list(set(L0))
I get: 我得到:
['a1-23456', 'A_B Cdef', 'A_B CdEF', 'A_B CDEF', 'a_B CdEF', 'A_b CDEF', 'GG_ooo']
I would like to keep just the first of the names that have same characters. 我只想保留第一个具有相同字符的名称。
So my result is: 所以我的结果是:
['a1-23456', 'A_B Cdef', 'GG_ooo']
If I use .lower()
, .upper()
I get the list, but the names are lower/upper cased. 如果我使用
.lower()
.upper()
.lower()
, .upper()
得到列表,但名称是小写/大写。
I just want to eliminate "duplicates" without considering case sensitive approach. 我只想消除“重复项”,而不考虑区分大小写的方法。
Help greatly appreciated. 帮助极大的赞赏。
Thanks! 谢谢!
You can track the .lower()
version of the values using a set and then append the original values to a new list if their .lower()
version isn't already in the set: 您可以使用集合跟踪值的
.lower()
版本,然后将原始值附加到新列表(如果.lower()
没有.lower()
版本):
s = set()
L = []
for x in L0:
if x.lower() not in s:
s.add(x.lower())
L.append(x)
print(L)
# ['A_B Cdef', 'GG_ooo', 'a1-23456']
使用哈希代替,我认为您不能通过集合轻松实现这一点。
L0 = {value.lower(): value for value in L0[::-1]}.values()
You already have several good answers, and the code below is probably overkill for your use-case, but just for fun I created a simple case-insensitive mutable set class. 您已经有了几个不错的答案,下面的代码可能对您的用例而言有些过头,但只是出于娱乐目的,我创建了一个不区分大小写的简单可变集类。 Note that it keeps the first string that it finds rather than letting it get clobbered by later entries.
请注意,它保留找到的第一个字符串,而不是让以后的条目淹没它。
import collections.abc
class CasefoldSet(collections.abc.MutableSet):
def __init__(self, iterable=None):
self.elements = {}
if iterable is not None:
for v in iterable:
self.add(v)
def __contains__(self, value):
return value.casefold() in self.elements
def add(self, value):
key = value.casefold()
if key not in self.elements:
self.elements[key] = value
def discard(self, value):
key = value.casefold()
if key in self.elements:
del self.elements[key]
def __len__(self):
return len(self.elements)
def __iter__(self):
return iter(self.elements.values())
def __repr__(self):
return '{' + ', '.join(map(repr, self)) + '}'
# test
l0 = [
'GG_ooo', 'A_B Cdef', 'A_B Cdef', 'A_B Cdef',
'A_B CdEF', 'A_B CDEF', 'a_B CdEF', 'A_b CDEF', 'a1-23456',
]
l1 = CasefoldSet(l0[:4])
print(l1)
l1 |= l0[4:]
print(l1)
l2 = {'a', 'b', 'A_B Cdef'} | l1
print(l2)
l3 = l2 & {'a', 'GG_ooo', 'a_B CdEF'}
print(l3)
output 输出
{'GG_ooo', 'A_B Cdef'}
{'GG_ooo', 'A_B Cdef', 'a1-23456'}
{'GG_ooo', 'A_B Cdef', 'a1-23456', 'b', 'a'}
{'a_B CdEF', 'a', 'GG_ooo'}
This class inherits various useful methods from collections.abc.MutableSet
, but to make it a full replacement for set
it does need a few more methods. 此类从
collections.abc.MutableSet
继承了各种有用的方法,但是要使其完全替代set
它确实需要更多方法。 Note that it will raise AttributeError
if you try to pass it non-string items . 请注意,如果您尝试传递非字符串项目,它将引发
AttributeError
。
If you want to play by the rules, the best solution I can think of is a bit messy, using sets to track which words have appeared; 如果您想遵守规则,我能想到的最好的解决方案是有点混乱,使用集来跟踪出现了哪些单词;
seen_words = set()
L1 = []
for word in L0:
if word.lower() not in seen_words:
L1.append(word)
seen_words.add(word.lower())
If you want to get a little hackier there is a more elegant solution, you can use a dictionary to track which words have already been seen, and it's an almost one-liner; 如果您想成为一名黑客,可以使用一种更优雅的解决方案,您可以使用词典来跟踪已经看到过的单词,并且几乎是一种语言。
seen_words = {}
L1 = [seen_words.setdefault(word.lower(), word)
for word in L0 if word.lower() not in seen_words]
print(L1)
Both solutions outputs the same result; 两种解决方案都输出相同的结果。
['A_B Cdef', 'GG_ooo', 'a1-23456']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.