简体   繁体   中英

How can I write these nested if statements more elegantly?

I'm writing a python program that removes duplicate words from a file. A word is defined as any sequence of characters without spaces and a duplicate is a duplicate regardless of the case so: duplicate, Duplicate, DUPLICATE, dUplIcaTe are all duplicates. The way it works is I read in the original file and store it as a list of strings. I then create a new empty list and populate it one at a time, checking whether the current string already exists in the new list. I run into problems when I try to implement the case conversion, which checks for all the instances of a specific case format. I've tried rewriting the if statement as:

 if elem and capital and title and lower not in uniqueList:

     uniqueList.append(elem)

I've also tried writing it with or statements as well:

 if elem or capital or title or lower not in uniqueList:

     uniqueList.append(elem)

However, I still get duplicates. The only way the program works properly is if I write the code like so:

def remove_duplicates(self):

    """
    self.words is a class variable, which stores the original text as a list of strings    
    """

    uniqueList = []

    for elem in self.words: 

        capital = elem.upper()
        lower = elem.lower()
        title = elem.title()

        if elem == '\n':
            uniqueList.append(elem)

        else:

            if elem not in uniqueList:
                if capital not in uniqueList:
                    if title not in uniqueList:
                        if lower not in uniqueList:
                            uniqueList.append(elem)

    self.words = uniqueList

Is there any way I can write these nested if statements more elegantly?

Combine the tests with and

if elem not in uniqueList and capital not in uniqueList and title not in uniqueList and lower not in uniqueList:

You can also use set operations:

if not set((elem, capital, title, lower)).isdisjoint(uniqueList):

But instead of testing all the different forms of elem , it would be simpler if you just put only lowercase words in self.words in the first place.

And make self.words a set instead of a list , then duplicates will be removed automatically.

If you want to preserve the original upper/lower cases in the input, check this one:

content = "Hello john hello  hELLo my naMe Is JoHN"
words = content.split()
dictionary = {}
for word in words:
    if word.lower() not in dictionary:
        dictionary[word.lower()] = [word]
    else:
        dictionary[word.lower()].append(word)
print(dictionary)

# here we have dictionary: {'hello': ['Hello', 'hello', 'hELLo'], 'john': ['john', 'JoHN'], 'my': ['my'], 'name': ['naMe'], 'is': ['Is']}
# we want the value of the keys that their list contains a single element

uniqs = []
for key, value in dictionary.items():
    if len(value) == 1:
        uniqs.extend(value)
print(uniqs)
# will print ['my', 'naMe', 'Is']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM