简体   繁体   中英

remove duplicates in lists of lists of list

I have a list of the type:

biglist = [[
    [77.56913757324219, 12.975883483886719], [77.5671615600586, 12.976168632507324], 
    [77.5680160522461, 12.980805397033691], [77.56996154785156, 12.980448722839355], 
    [77.56913757324219, 12.975883483886719]
]]

In order to remove duplicates I wrote the following in python:

templist = set(map(tuple,biglist[0]))
newlist = map(list,templist)

Although this removes the dup elements, I loose to the initial structure of lists in lists of list. Can someone help remove the duplicates?

Thank you

EDIT: A little background on what I am trying to do:
The numbers you see are part of a polygon shape used in geographic information system. I am trying to store and index it in MongoDB. However, I'm getting an error while indexing, and one of the possible solutions is to remove duplicate values and try indexing it again. Basically, the list what you see is in GeoJSON format and I need to preserve that order for inserting into mongoDB.

The expected output would be:

[[
        [77.56913757324219, 12.975883483886719], [77.5671615600586, 12.976168632507324], 
        [77.5680160522461, 12.980805397033691], [77.56996154785156, 12.980448722839355]
    ]]

Right, this should get what you want:

>>> newlist = [[]]
>>> for i in range(len(biglist[0])):
...     if not test[0][i] in newlist[0]:
...         newlist[0].append(test[0][i])
...
>>> newlist
[[[77.56913757324219, 12.975883483886719], [77.5671615600586, 12.976168632507324], [77.5680160522461, 12.980805397033691], [77.56996154785156, 12.980448722839355]]]

However, you really should consider a few things:

  • As I pointed out in the comments, comparing floats is going to cause you a lot of trouble. Values that differ by less than 1x10^8 (or less) will cause your comparison to fail, and such differences can be caused by a lack of precision (floating point error). You should always compare floats with a tolerance to avoid this.

  • I'm not sure why you have a double nested list, but from what you've given here, it seems pretty silly, and makes everything more complicated

Also, your solution didn't work because you converted your list to a set. Since sets are inherently unordered, when you reconverted to a list, the order changed. In the future, if you care about the order your elements in (which form your edit, it sounds like you do), avoid using sets (or dictionaries) for this reason.

This solution is not the best but it might help you:

#!/usr/bin/python

biglist = [[
    [77.56913757324219, 12.975883483886719], [77.5671615600586, 12.976168632507324],
    [77.5680160522461, 12.980805397033691], [77.56996154785156, 12.980448722839355],
    [77.56913757324219, 12.975883483886719]
]]

blist = map(tuple, biglist[0])

seen = set()
result = list()
for tup in blist:
    if tup not in seen:
        seen.add(tup)
        result.append(tup)

print map(list, result)

Or you can try using order dict:

>>> import collections
>>> a = collections.OrderedDict()
>>> for big in biglist[0]:
...     a.setdefault(tuple(big), None)
...
>>> a.keys()
[(77.56913757324219, 12.975883483886719), (77.5671615600586, 12.976168632507324),     (77.5680160522461, 12.980805397033691), (77.56996154785156, 12.980448722839355)]
>>>

如果要坚持使用原始解决方案,只需使用OrderedSet而不是set

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM