I am trying to create a unique list of objects using Python and I am failing. It doesn't matter whether I use a list or a set, it does not seem to work. When I printed the list/set I noticed a couple of non-unique objects in the list. I realised that was the case because some of those objects had a 'space' at the start of the word. I looked around and thought using the ·lstrip(' ')· would help my cause, but sadly it doesn't.
The weirdest thing is that the 'number of unique objects' is correct, but the list of unique objects created at the end is wrong. Can anyone please point to me where I'm going wrong?
The column I'm interested in is 'Object' and the unique list should contain Owl , Cat , Fox , Cow , Goat , Dog , Ant , Buffalo , Lion and tiger .
Sample data:
Key ID Name Code State Object
01 NULL NULL NULL NULL Athletics, Light,Netball
02 NULL NULL NULL NULL BMX Track, Gridiron, Oval
05 NULL NULL NULL NULL Dog park, Cricket, Soccer
10 NULL NULL NULL NULL Netball, Oval, Softball
21 NULL NULL NULL NULL Seat, Playground, Ping Pong Table
13 NULL NULL NULL NULL Bench, Bike Rack, Seat
My working code is attached below:
import csv
fOpen1=open('C:\Data.csv')
uniqueList=csv.writer(open('C:\UniqueList.csv', 'wb'))
Master=csv.reader(fOpen1)
Master.next()
unique=[]
for row in Master:
for item in row[5].split(','):
item.strip(' ')
if item not in unique:
unique.append(item)
uniqueList.writerow(unique)
What I'm getting at the end of this is duplicates including 2 foxes and missing a few unique entries as well. Of course this is just dummy data but I hope I am clear in explaining what's going on.
UPDATE1: I have updated the script and it works okay however another issue has cropped up. I have updated the column with real data that I'm working with. The unique items that are NOT being added to the final list include:
Gridiron
Cricket
Ping Pong Table
Softball
UPDATE2:
I have reverted to the original 'wrong' script because it works okay now. There was something wrong with the csv file I was working off.
Thanks
str.lstrip(' ')
is not an in-place method, it returns the stripped string. You need to assign it back to object
-
object = object.lstrip(' ')
Assuming Python 2.7+ (or 3.1+) , a faster way would be to use set
, and maybe set comprehension
. Example -
unique = {obj.lstrip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))
Please note, this would not preserve any order , since set
s are not ordered. If order is important, you can use a set
to store the values that are already seen. Example -
unique=[]
seen_set = set()
for row in Master:
for obj in row[5].split(','):
obj = obj.lstrip(' ')
if obj not in seen_set:
unique.append(obj)
seen_set.add(obj)
Also, I would like to advice that you should not use object
as a variable name as that is the name of the built-in class (that is extended by all other classes) .
Also, seems like there are some strings with whitespaces at the end, so it would be better to use .strip()
or .strip(' ')
instead of .lstrip(' ')
. Example of strip
with set comprehension -
unique = {obj.strip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))
edit your code something like this:
for object in row[5].split(','):
object=object.strip()
if object not in unique:
unique.append(object)
strip will remove all spaces from right and left.and assign the object into new object as
object = object .strip()
A set comprehension would serve you well.
First off, let's get rid of that open file by using a context manager:
import csv
with open('C:\Data.csv') as raw:
master = csv.reader(raw)
master.next() # Ignore the header
unique = {y.strip() for row in master for y in row[-1].split(',')}
Ok, lets go over what we did there. We opened the file using a context manager so the file will be closed automatically. Then we read in the csv using csv.reader and iterated past the first row.
Here's where it gets tricky - we created a set by iterating over the lists in the csv, then iterating over the contents of those lists. A more verbose way:
unique = set()
for row in master:
for item in row[-1].split(','):
unique.add(item.strip())
This accomplishes much the same thing, possibly in an easier to understand format. Also, note that I used -1 to slice to the last column in the csv.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.