[英]Python returning unique words from a list (case insensitive)
I need help with returning unique words (case insensitive) from a list in order. 我需要帮助从列表中按顺序返回唯一的单词(不区分大小写)。
For example: 例如:
def case_insensitive_unique_list(["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"])
Will return: ["We", "are", "one", "the", "world", "UNIVERSE"] 将返回:[“我们”,“是”,“一个”,“该”,“世界”,“宇宙”]
So far this is what I've got: 到目前为止,这就是我所拥有的:
def case_insensitive_unique_list(list_string):
uppercase = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
lowercase = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
temp_unique_list = []
for i in list_string:
if i not in list_string:
temp_unique_list.append(i)
I am having trouble comparing every individual words from the temp_unique_list whether that word repeats itself or not. 我无法比较temp_unique_list中的每个单词,无论该单词是否重复。 For example: "to" and "To" (I am assuming range function will be useful) 例如:“to”和“To”(我假设范围函数会很有用)
And to make it return the word that comes first from the original list that function will take in. 并使它返回首先从函数将接受的原始列表中出现的单词。
How would I do this using the for loop ? 我怎么用for循环呢?
You can do this with the help of a for
loop and set
data structure, like this 您可以在for
循环和set
数据结构的帮助下完成此操作,如下所示
def case_insensitive_unique_list(data):
seen, result = set(), []
for item in data:
if item.lower() not in seen:
seen.add(item.lower())
result.append(item)
return result
Output 产量
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
You can use set()
and a list comprehension: 您可以使用set()
和列表理解:
>>> seen = set()
>>> lst = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
>>> [x for x in lst if x.lower() not in seen and not seen.add(x.lower())]
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
You can do that as: 你可以这样做:
l = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
a = []
for i in l:
if i.lower() not in [j.lower() for j in a]:
a.append(i)
>>> print a
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
l=["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
so=[]
for w in l:
if w.lower() not in so:
so.append(w.lower())
In [14]: so
Out[14]: ['we', 'are', 'one', 'the', 'world', 'universe']
You can use a set to ensure uniqueness. 您可以使用一组来确保唯一性。 When you try to add a repeat item to a set it will simply discard it if it's already in there. 当您尝试将重复项添加到集合时,如果它已经在那里,它将简单地丢弃它。
You should also be using the in-built lower() function to manage the case-insensitivity. 您还应该使用内置的lower()函数来管理不区分大小写。
uniques = set()
for word in words:
set.add(word.lower()) #lower it first and then add it
If this is for a homework task and using set is off limits, then you can easily adapt it to use lists only, just loop through and add the condition: 如果这是用于家庭作业任务并且使用set是禁止的,那么您可以轻松地将其调整为仅使用列表,只需循环并添加条件:
uniques = list()
if word.lower() not in uniques:
#etc
You can use collections.OrderedDict
like this. 你可以像这样使用collections.OrderedDict
。
from collections import OrderedDict
def case_insensitive_unique_list(data):
d = OrderedDict()
for word in data:
d.setdefault(word.lower(), word)
return d.values()
Output: 输出:
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
ok, removed my previous answer, as I misread the OP's post. 好的,删除了我以前的答案,因为我误读了OP的帖子。 All my apologies. 我所有的道歉。
As an excuse, for the fun of it and the sake of doing it in different ways, here's another solution, though it's neither the most efficient one, or the best: 作为借口,为了它的乐趣和以不同的方式做到这一点,这里是另一种解决方案,虽然它既不是最有效的,也不是最好的:
>>> from functools import reduce
>>> for it in reduce(lambda l,it: l if it in set({i.lower() for i in l}) else l+[it], lst, []):
... print(it, end=", ")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.