I have a huge python list, about 100 MB size with strings and integers. I have some strings as triplicates and duplicates. I have tried to remove duplicates with this code:
from collections import OrderedDict
duplicates = [.......large size list of 100 MB....]
remove = OrderedDict.fromkeys(duplicates).keys()
print remove
I have done with small size lists and it works good, with this large list, it has taken me a whole day and am not yet done. Any suggestions on how this can be done in minutes, ..fewer hrs??. I have tried CUDA installation in Ubuntu to work it out but I keep getting errors: see here
Not sure if this is efficient enough, but one simple way to solve it is to cast your list into a set.
def unique(objects):
return list(sorted(set(objects)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.