简体   繁体   中英

Python returning unique words from a list (case insensitive)

I need help with returning unique words (case insensitive) from a list in order.

For example:

def case_insensitive_unique_list(["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"])

Will return: ["We", "are", "one", "the", "world", "UNIVERSE"]

So far this is what I've got:

def case_insensitive_unique_list(list_string):

uppercase = ["A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
lowercase = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]

temp_unique_list = []

for i in list_string:
    if i not in list_string:
        temp_unique_list.append(i)

I am having trouble comparing every individual words from the temp_unique_list whether that word repeats itself or not. For example: "to" and "To" (I am assuming range function will be useful)

And to make it return the word that comes first from the original list that function will take in.

How would I do this using the for loop ?

You can do this with the help of a for loop and set data structure, like this

def case_insensitive_unique_list(data):
    seen, result = set(), []
    for item in data:
        if item.lower() not in seen:
            seen.add(item.lower())
            result.append(item)
    return result

Output

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

You can use set() and a list comprehension:

>>> seen = set()
>>> lst = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
>>> [x for x in lst if x.lower() not in seen and not seen.add(x.lower())]
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

You can do that as:

l = ["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]

a = []

for i in l:
    if i.lower() not in [j.lower() for j in a]:
        a.append(i)

>>> print a
['We', 'are', 'one', 'the', 'world', 'UNIVERSE']
l=["We", "are", "one", "we", "are", "the", "world", "we", "are", "THE", "UNIVERSE"]
so=[]
for w in l:
    if w.lower() not in so:
        so.append(w.lower())

In [14]: so
Out[14]: ['we', 'are', 'one', 'the', 'world', 'universe']

You can use a set to ensure uniqueness. When you try to add a repeat item to a set it will simply discard it if it's already in there.

You should also be using the in-built lower() function to manage the case-insensitivity.

uniques = set()
for word in words:
    set.add(word.lower()) #lower it first and then add it

If this is for a homework task and using set is off limits, then you can easily adapt it to use lists only, just loop through and add the condition:

uniques = list()
if word.lower() not in uniques:
    #etc

You can use collections.OrderedDict like this.

from collections import OrderedDict
def case_insensitive_unique_list(data):
    d = OrderedDict()
    for word in data:
        d.setdefault(word.lower(), word)
    return d.values()

Output:

['We', 'are', 'one', 'the', 'world', 'UNIVERSE']

ok, removed my previous answer, as I misread the OP's post. All my apologies.

As an excuse, for the fun of it and the sake of doing it in different ways, here's another solution, though it's neither the most efficient one, or the best:

>>> from functools import reduce
>>> for it in reduce(lambda l,it: l if it in set({i.lower() for i in l}) else l+[it], lst, []):
...     print(it, end=", ")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM