简体   繁体   English

Python中唯一的对象列表无法正常工作

[英]Unique list of objects in Python not working

I am trying to create a unique list of objects using Python and I am failing. 我试图使用Python创建一个唯一的对象列表,我失败了。 It doesn't matter whether I use a list or a set, it does not seem to work. 无论我使用列表还是集合都没关系,它似乎不起作用。 When I printed the list/set I noticed a couple of non-unique objects in the list. 当我打印列表/集时,我注意到列表中有几个非唯一对象。 I realised that was the case because some of those objects had a 'space' at the start of the word. 我意识到情况就是这样,因为其中一些对象在单词的开头有一个“空格”。 I looked around and thought using the ·lstrip(' ')· would help my cause, but sadly it doesn't. 我环顾四周,想到使用·lstrip('')将有助于我的事业,但遗憾的是它没有。

The weirdest thing is that the 'number of unique objects' is correct, but the list of unique objects created at the end is wrong. 最奇怪的是“唯一对象的数量”是正确的,但最后创建的唯一对象列表是错误的。 Can anyone please point to me where I'm going wrong? 任何人都可以指出我哪里出错了?

The column I'm interested in is 'Object' and the unique list should contain Owl , Cat , Fox , Cow , Goat , Dog , Ant , Buffalo , Lion and tiger . 我感兴趣的专栏是'对象',唯一的列表应该包含猫头鹰狐狸山羊蚂蚁水牛狮子老虎

Sample data: 样本数据:

Key    ID    Name    Code    State    Object
01     NULL  NULL   NULL    NULL      Athletics, Light,Netball
02     NULL  NULL   NULL    NULL      BMX Track, Gridiron, Oval
05     NULL  NULL   NULL    NULL      Dog park, Cricket, Soccer
10     NULL  NULL   NULL    NULL      Netball, Oval, Softball
21     NULL  NULL   NULL    NULL      Seat, Playground, Ping Pong Table
13     NULL  NULL   NULL    NULL      Bench, Bike Rack, Seat

My working code is attached below: 我的工作代码如下:

import csv

fOpen1=open('C:\Data.csv')
uniqueList=csv.writer(open('C:\UniqueList.csv', 'wb'))

Master=csv.reader(fOpen1)
Master.next()

unique=[]

for row in Master:
    for item in row[5].split(','):
        item.strip(' ')
        if item not in unique:
            unique.append(item)
uniqueList.writerow(unique)

What I'm getting at the end of this is duplicates including 2 foxes and missing a few unique entries as well. 我最后得到的是重复,包括2只狐狸,也缺少一些独特的条目。 Of course this is just dummy data but I hope I am clear in explaining what's going on. 当然这只是虚拟数据,但我希望我能清楚地解释发生了什么。

UPDATE1: I have updated the script and it works okay however another issue has cropped up. 更新1:我已经更新了脚本,但它可以正常工作但是又出现了另一个问题。 I have updated the column with real data that I'm working with. 我已经使用我正在使用的真实数据更新了该列。 The unique items that are NOT being added to the final list include: 未添加到最终列表的唯一项包括:

Gridiron
Cricket
Ping Pong Table
Softball

UPDATE2: UPDATE2:

I have reverted to the original 'wrong' script because it works okay now. 我已经恢复到原来的'错误'脚本,因为它现在可以正常工作了。 There was something wrong with the csv file I was working off. 我正在处理的csv文件有问题。

Thanks 谢谢

str.lstrip(' ') is not an in-place method, it returns the stripped string. str.lstrip(' ')不是就地方法,它返回剥离的字符串。 You need to assign it back to object - 你需要将它分配回object -

object = object.lstrip(' ')

Assuming Python 2.7+ (or 3.1+) , a faster way would be to use set , and maybe set comprehension . 假设Python 2.7+(或3.1+),更快的方法是使用set ,也许set comprehension Example - 示例 -

unique = {obj.lstrip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))

Please note, this would not preserve any order , since set s are not ordered. 请注意,这将不保存任何顺序,因为set s的不排序。 If order is important, you can use a set to store the values that are already seen. 如果订单很重要,您可以使用一个set来存储已经看到的值。 Example - 示例 -

unique=[]
seen_set = set()
for row in Master:
    for obj in row[5].split(','):
        obj = obj.lstrip(' ')
        if obj not in seen_set:
            unique.append(obj)
            seen_set.add(obj)

Also, I would like to advice that you should not use object as a variable name as that is the name of the built-in class (that is extended by all other classes) . 另外,我想建议您不要将object用作变量名,因为它是内置类的名称(由所有其他类扩展)。


Also, seems like there are some strings with whitespaces at the end, so it would be better to use .strip() or .strip(' ') instead of .lstrip(' ') . 此外,似乎最后有一些带有空格的字符串,因此最好使用.strip().strip(' ')而不是.lstrip(' ') Example of strip with set comprehension - 具有集合理解的strip带示例 -

unique = {obj.strip() for row in Master for obj in row[5].split(',')}
uniqueList.writerow(list(unique))

edit your code something like this: 编辑你的代码是这样的:

for object in row[5].split(','):
        object=object.strip()
        if object not in unique:
            unique.append(object)

strip will remove all spaces from right and left.and assign the object into new object as strip将从右侧和左侧删除所有空格。并将对象分配为新对象as

object = object .strip()  

A set comprehension would serve you well. 一套理解能为你服务。

First off, let's get rid of that open file by using a context manager: 首先,让我们使用上下文管理器删除该打开的文件:

import csv

with open('C:\Data.csv') as raw:
    master = csv.reader(raw)
    master.next()  # Ignore the header
    unique = {y.strip() for row in master for y in row[-1].split(',')}

Ok, lets go over what we did there. 好吧,让我们回顾一下我们在那里做的事情。 We opened the file using a context manager so the file will be closed automatically. 我们使用上下文管理器打开文件,因此文件将自动关闭。 Then we read in the csv using csv.reader and iterated past the first row. 然后我们使用csv.reader读取csv并迭代超过第一行。

Here's where it gets tricky - we created a set by iterating over the lists in the csv, then iterating over the contents of those lists. 这就是它变得棘手的地方 - 我们通过迭代csv中的列表创建了一个集合,然后迭代这些列表的内容。 A more verbose way: 一个更冗长的方式:

unique = set()
for row in master:
    for item in row[-1].split(','):
        unique.add(item.strip())

This accomplishes much the same thing, possibly in an easier to understand format. 这实现了很多相同的事情,可能是一种更容易理解的格式。 Also, note that I used -1 to slice to the last column in the csv. 另外,请注意我使用-1切片到csv中的最后一列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM