How can I write these nested if statements more elegantly?

Question

I'm writing a python program that removes duplicate words from a file. A word is defined as any sequence of characters without spaces and a duplicate is a duplicate regardless of the case so: duplicate, Duplicate, DUPLICATE, dUplIcaTe are all duplicates. The way it works is I read in the original file and store it as a list of strings. I then create a new empty list and populate it one at a time, checking whether the current string already exists in the new list. I run into problems when I try to implement the case conversion, which checks for all the instances of a specific case format. I've tried rewriting the if statement as:

 if elem and capital and title and lower not in uniqueList:

     uniqueList.append(elem)

I've also tried writing it with or statements as well:

 if elem or capital or title or lower not in uniqueList:

     uniqueList.append(elem)

However, I still get duplicates. The only way the program works properly is if I write the code like so:

def remove_duplicates(self):

    """
    self.words is a class variable, which stores the original text as a list of strings    
    """

    uniqueList = []

    for elem in self.words: 

        capital = elem.upper()
        lower = elem.lower()
        title = elem.title()

        if elem == '\n':
            uniqueList.append(elem)

        else:

            if elem not in uniqueList:
                if capital not in uniqueList:
                    if title not in uniqueList:
                        if lower not in uniqueList:
                            uniqueList.append(elem)

    self.words = uniqueList

Is there any way I can write these nested if statements more elegantly?

Answer 1

Combine the tests with and

if elem not in uniqueList and capital not in uniqueList and title not in uniqueList and lower not in uniqueList:

You can also use set operations:

if not set((elem, capital, title, lower)).isdisjoint(uniqueList):

But instead of testing all the different forms of elem , it would be simpler if you just put only lowercase words in self.words in the first place.

And make self.words a set instead of a list , then duplicates will be removed automatically.

Answer 2

If you want to preserve the original upper/lower cases in the input, check this one:

content = "Hello john hello  hELLo my naMe Is JoHN"
words = content.split()
dictionary = {}
for word in words:
    if word.lower() not in dictionary:
        dictionary[word.lower()] = [word]
    else:
        dictionary[word.lower()].append(word)
print(dictionary)

# here we have dictionary: {'hello': ['Hello', 'hello', 'hELLo'], 'john': ['john', 'JoHN'], 'my': ['my'], 'name': ['naMe'], 'is': ['Is']}
# we want the value of the keys that their list contains a single element

uniqs = []
for key, value in dictionary.items():
    if len(value) == 1:
        uniqs.extend(value)
print(uniqs)
# will print ['my', 'naMe', 'Is']

How can I write these nested if statements more elegantly?

Question

2 answers

solution1
1 ACCPTED 2020-01-23 02:44:33

solution2
0 2020-01-23 02:49:38

How can I write these nested if statements more elegantly?

Question

2 answers

solution1 1 ACCPTED 2020-01-23 02:44:33

solution2 0 2020-01-23 02:49:38

solution1
1 ACCPTED 2020-01-23 02:44:33

solution2
0 2020-01-23 02:49:38