简体   繁体   中英

Unique list of objects in Python not working

I am trying to create a unique list of objects using Python and I am failing. It doesn't matter whether I use a list or a set, it does not seem to work. When I printed the list/set I noticed a couple of non-unique objects in the list. I realised that was the case because some of those objects had a 'space' at the start of the word. I looked around and thought using the ·lstrip(' ')· would help my cause, but sadly it doesn't.

The weirdest thing is that the 'number of unique objects' is correct, but the list of unique objects created at the end is wrong. Can anyone please point to me where I'm going wrong?

The column I'm interested in is 'Object' and the unique list should contain Owl , Cat , Fox , Cow , Goat , Dog , Ant , Buffalo , Lion and tiger .

Sample data:

Key    ID    Name    Code    State    Object
01     NULL  NULL   NULL    NULL      Athletics, Light,Netball
02     NULL  NULL   NULL    NULL      BMX Track, Gridiron, Oval
05     NULL  NULL   NULL    NULL      Dog park, Cricket, Soccer
10     NULL  NULL   NULL    NULL      Netball, Oval, Softball
21     NULL  NULL   NULL    NULL      Seat, Playground, Ping Pong Table
13     NULL  NULL   NULL    NULL      Bench, Bike Rack, Seat

My working code is attached below:

import csv

fOpen1=open('C:\Data.csv')
uniqueList=csv.writer(open('C:\UniqueList.csv', 'wb'))

Master=csv.reader(fOpen1)
Master.next()

unique=[]

for row in Master:
    for item in row[5].split(','):
        item.strip(' ')
        if item not in unique:
            unique.append(item)
uniqueList.writerow(unique)

What I'm getting at the end of this is duplicates including 2 foxes and missing a few unique entries as well. Of course this is just dummy data but I hope I am clear in explaining what's going on.

UPDATE1: I have updated the script and it works okay however another issue has cropped up. I have updated the column with real data that I'm working with. The unique items that are NOT being added to the final list include:

Gridiron
Cricket
Ping Pong Table
Softball

UPDATE2:

I have reverted to the original 'wrong' script because it works okay now. There was something wrong with the csv file I was working off.

Thanks

str.lstrip(' ') is not an in-place method, it returns the stripped string. You need to assign it back to object -

object = object.lstrip(' ')

Assuming Python 2.7+ (or 3.1+) , a faster way would be to use set , and maybe set comprehension . Example -

unique = {obj.lstrip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))

Please note, this would not preserve any order , since set s are not ordered. If order is important, you can use a set to store the values that are already seen. Example -

unique=[]
seen_set = set()
for row in Master:
    for obj in row[5].split(','):
        obj = obj.lstrip(' ')
        if obj not in seen_set:
            unique.append(obj)
            seen_set.add(obj)

Also, I would like to advice that you should not use object as a variable name as that is the name of the built-in class (that is extended by all other classes) .


Also, seems like there are some strings with whitespaces at the end, so it would be better to use .strip() or .strip(' ') instead of .lstrip(' ') . Example of strip with set comprehension -

unique = {obj.strip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))

edit your code something like this:

for object in row[5].split(','):
        object=object.strip()
        if object not in unique:
            unique.append(object)

strip will remove all spaces from right and left.and assign the object into new object as

object = object .strip()  

A set comprehension would serve you well.

First off, let's get rid of that open file by using a context manager:

import csv

with open('C:\Data.csv') as raw:
    master = csv.reader(raw)
    master.next()  # Ignore the header
    unique = {y.strip() for row in master for y in row[-1].split(',')}

Ok, lets go over what we did there. We opened the file using a context manager so the file will be closed automatically. Then we read in the csv using csv.reader and iterated past the first row.

Here's where it gets tricky - we created a set by iterating over the lists in the csv, then iterating over the contents of those lists. A more verbose way:

unique = set()
for row in master:
    for item in row[-1].split(','):
        unique.add(item.strip())

This accomplishes much the same thing, possibly in an easier to understand format. Also, note that I used -1 to slice to the last column in the csv.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM