Python: remove() doesn't seems to work

Question

I searched for a while but I can't find a solution to my problem. I'm still new to Python, so I'm sometime struggling with obvious things... Thanks by advance for your advises!

I have a list containing objects and duplicates of these objects, both have specific names: objects_ext and duplicatedObject_SREF_ext . What I want is that if there is a duplicated object in my list, check if the original object is also in list, if it is, remove the duplicated object from list.

I tried to use the remove() method, as there can only be one occurrence of each name in the list, but it doesn't work. Here is my code:

rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']
# objects with '_SREF_' in their names are the duplicated ones
for obj in rawSelection:
    if '_SREF_' in str(obj):
        rawName = str(obj).split('_')
        rootName = rawName [0]
        defName = rootName + '_' + '_'.join(rawName[2:])
        if defName in rawSelection:
            rawSelection.remove (obj)
# Always returns:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'doubidou_SREF_high']
# Instead of:
# [u'crapacruk_high', u'doubidou_high', u'blahbli_high']

Edit: Oh, forgot to say that the duplicated object must be removed from list only if the original one is in it too.

Answer 1

The problem is that you're mutating the same list you're iterating over.

When you remove u'crapacruk_SREF_high' from the list, everything after it shifts to the left (this done on the C source code level) so the value of obj is now u'doubidou_SREF_high' . Then the end of the for loop comes and obj becomes the next element in the list, u'blahbli_SREF_high' .

To fix this you can copy the list over and get

for obj in rawSelection[:]:
  ...

Answer 2

You can turn the for loop from for obj in rawSelection: to for obj in list(rawSelection): . This should fix your issue as it iterates over the copy of the list. The way you do it, you modify the list while iterating over it, leading to problems.

rawSelection = [u'crapacruk_high', u'doubidou_high', u'blahbli_high', u'crapacruk_SREF_high', u'doubidou_SREF_high', u'blahbli_SREF_high']

for obj in list(rawSelection):
    if '_SREF_' in str(obj):
        rawName = str(obj).split('_')
        rootName = rawName [0]
        defName = rootName + '_' + '_'.join(rawName[2:])
        if defName in rawSelection:
            rawSelection.remove (obj)

print(rawSelection)

Answer 3

This will do what you want (note that it doesn't matter what order the items appear in):

rawSelection = list({i.replace('_SREF', '') for i in rawSelection})

This works by iterating through the original list, and removing the '_SREF' substring from each item. Then each edited string object is added to a set comprehension (that's what the {} brackets mean: a new set object is being created). Then the set object is turned back into a list object.

This works because for set objects, you can't have duplicate items, so when an attempt is made to add a duplicate, it fails (silently). Note that the order of the original items is not preserved.

EDIT: as @PeterDeGlopper pointed out in the comments, this does not work for the constraint that the _SREF_ item only gets removed only if the original appears. For that, we'll do the following:

no_SREF_Set = {i for i in rawSelection if '_SREF_' not in i}
rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in no_SREF_Set else i for i in rawSelection})

You can combine this into a one-liner, but it's a little long for my taste:

rawSelection = list({i.replace('_SREF', '') if i.replace('_SREF', '') in {i for i in rawSelection if '_SREF_' not in i} else i for i in rawSelection})

This works by creating a set of the items that don't have '_SREF_' , and then creating a new list (similar to the above) that only replaces the '_SREF' if the no '_SREF_' version of the item appears in the no_SREF_Set .

Answer 4

Break the problem up into subtasks

def get_orig_name(name):
    if '_SREF_' in name:
        return '_'.join(name.split('_SREF_'))
    else:
        return name

Then just construct a new list with no dups

rawSelection = [u'crapacruk_high', 
                u'doubidou_high', 
                u'blahbli_high', 
                u'crapacruk_SREF_high', 
                u'doubidou_SREF_high', 
                u'blahbli_SREF_high']

uniqueList = [ n for n in rawSelection if ('_SREF_' not in n) or    
                                          (get_orig_name(n) not in rawSelection ) ]

print uniqueList

Answer 5

You could use filter to get quite a clean solution.

def non_duplicate(s):
    return not('_SREF_' in s and s.replace('_SREF', '') in raw_selection)

filtered_selection = filter(non_duplicate, raw_selection)

Python: remove() doesn't seems to work

Question

5 answers

solution1
3 2016-01-12 17:23:30

solution2
1 ACCPTED 2016-01-12 17:22:59

solution3
0 2016-01-12 17:23:00

solution4
0 2016-01-12 17:37:02

solution5
0 2016-01-12 18:58:33

Python: remove() doesn't seems to work

Question

5 answers

solution1 3 2016-01-12 17:23:30

solution2 1 ACCPTED 2016-01-12 17:22:59

solution3 0 2016-01-12 17:23:00

solution4 0 2016-01-12 17:37:02

solution5 0 2016-01-12 18:58:33

solution1
3 2016-01-12 17:23:30

solution2
1 ACCPTED 2016-01-12 17:22:59

solution3
0 2016-01-12 17:23:00

solution4
0 2016-01-12 17:37:02

solution5
0 2016-01-12 18:58:33