繁体   English   中英

Python从列表中返回唯一的单词(不区分大小写)

[英]Python returning unique words from a list (case insensitive)

我需要帮助从列表中按顺序返回唯一的单词(不区分大小写)。

例如:

def case_insensitive_unique_list(["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"])

将返回:[“我们”,“是”,“一个”,“该”,“世界”,“宇宙”]

到目前为止,这就是我所拥有的:

def case_insensitive_unique_list(list_string):

uppercase = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
lowercase = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]

temp_unique_list = []

for i in list_string:
    if i not in list_string:
        temp_unique_list.append(i)

我无法比较temp_unique_list中的每个单词,无论该单词是否重复。 例如:“to”和“To”(我假设范围函数会很有用)

并使它返回首先从函数将接受的原始列表中出现的单词。

我怎么用for循环呢?

您可以在for循环和set数据结构的帮助下完成此操作,如下所示

def case_insensitive_unique_list(data):
    seen, result = set(), []
    for item in data:
        if item.lower() not in seen:
            seen.add(item.lower())
            result.append(item)
    return result

产量

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

您可以使用set()和列表理解:

>>> seen = set()
>>> lst = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
>>> [x for x in lst if x.lower() not in seen and not seen.add(x.lower())]
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

你可以这样做:

l = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]

a = []

for i in l:
    if i.lower() not in [j.lower() for j in a]:
        a.append(i)

>>> print a
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
l=["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
so=[]
for w in l:
    if w.lower() not in so:
        so.append(w.lower())

In [14]: so
Out[14]: ['we', 'are', 'one', 'the', 'world', 'universe']

您可以使用一组来确保唯一性。 当您尝试将重复项添加到集合时,如果它已经在那里,它将简单地丢弃它。

您还应该使用内置的lower()函数来管理不区分大小写。

uniques = set()
for word in words:
    set.add(word.lower()) #lower it first and then add it

如果这是用于家庭作业任务并且使用set是禁止的,那么您可以轻松地将其调整为仅使用列表,只需循环并添加条件:

uniques = list()
if word.lower() not in uniques:
    #etc

你可以像这样使用collections.OrderedDict

from collections import OrderedDict
def case_insensitive_unique_list(data):
    d = OrderedDict()
    for word in data:
        d.setdefault(word.lower(), word)
    return d.values()

输出:

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

好的,删除了我以前的答案,因为我误读了OP的帖子。 我所有的道歉。

作为借口,为了它的乐趣和以不同的方式做到这一点,这里是另一种解决方案,虽然它既不是最有效的,也不是最好的:

>>> from functools import reduce
>>> for it in reduce(lambda l,it: l if it in set({i.lower() for i in l}) else l+[it], lst, []):
...     print(it, end=", ")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM