简体   繁体   English

仅向python列表添加唯一项?

[英]Appending only unique items to python list?

I am running a python scraping script and am getting a list like this - 我正在运行一个python scraping脚本,我得到一个这样的列表 -

[u'UI/UX Designer\xa0\u2013 Creative Head ', u'UX Designer ', u'UI/UX Designer\xa0\u2013 Creative Head', u'UX Designer']

I wanted to add unique items only from the list so i used this - 我想只从列表中添加唯一的项目,所以我使用了这个 -

profile_list = []
k = soup.body.findAll(text=re.compile("UX Designer"))
    for i in k:
        if i not in profile_list:
            profile_list.append(i)
    print profile_list

But it is not working, duplicate items still remain. 但它不起作用,重复的项目仍然存在。 I also tried to use set() as well but it is also not working here. 我也试过使用set(),但它也没有在这里工作。 What should i do here to add the unique items only? 我该怎么办才能添加独特的商品?

Update - Thank you for the answers, i did a silly mistake here, two of the repeating words in the list have extra space here which should be removed. 更新 - 谢谢你的答案,我在这里犯了一个愚蠢的错误,列表中的两个重复单词在这里有额外的空间应该删除。 All the answers are correct so picked the oldest. 所有答案都是正确的,所以挑选最老的。

The first two strings in the list contains trailing spaces. 列表中的前两个字符串包含尾随空格。

A string with a space and the other string without a space are different even though other characters are same: 即使其他字符相同,带空格的字符串和没有空格的其他字符串也不同:

>>> 'a' == 'a '
False

You need to strip them: 你需要剥离它们:

for i in k:
    i = i.strip()  # <----
    if i not in profile_list:
        profile_list.append(i)

UPDATE If the order of list items is not important, you can use set : 更新如果列表项的顺序不重要,您可以使用set

profile_list = list(set(s.strip() for s in k))  # Using `set` with generator expression

profile_list = list({s.strip() for s in k})  # Using set comprehension

There is a trailing space at the end of your strings, you should strip the excess whitespace. 你的字符串末尾有一个尾随空格,你应该删除多余的空格。 Use set or list comprehensions to make your code Pythonic. 使用set或list comprehensions来编写Pythonic代码。 If you want the elements to be unique I also suggest using a set: 如果你想要元素是唯一的,我还建议使用一个集合:

>>> st = [u'UI/UX Designer\xa0\u2013 Creative Head ', u'UX Designer ', u'UI/UX Designer\xa0\u2013 Creative Head', u'UX Designer']
>>> uniques = {elem.strip() for elem in st}
>>> uniques
set([u'UX Designer', u'UI/UX Designer\xa0\u2013 Creative Head'])

Looking at the output, the code you are using is actually working. 查看输出,您使用的代码实际上正在工作。 The problem is, there is an extra space in the text: 问题是,文本中有一个额外的空间:

[u'UI/UX Designer\xa0\u2013 Creative Head ', # Note the space here
u'UX Designer ', # and here
u'UI/UX Designer\xa0\u2013 Creative Head',
u'UX Designer'
]

All you need to do is strip() them: 你需要做的就是strip()它们:

profile_list = []
k = soup.body.findAll(text=re.compile("UX Designer"))
for i in k:
    if i.strip() not in profile_list:
        profile_list.append(i.strip())
print profile_list

Another way, as mentioned by @edwinskl is to make it a set() from the beginning: 另一种方法,正如@edwinskl所提到的那样,从一开始就使它成为一个set()

profile_list = set()
k = soup.body.findAll(text=re.compile("UX Designer"))
for i in k:
    if i.strip() not in profile_list:
        profile_list.add(i.strip())
print profile_list

Or another way (which I thought of when I first looked at your question) is to make it a set afterwards: 或者另一种方式(当我第一次看到你的问题时我想到的)是在之后使它成为一set

profile_list = []
k = soup.body.findAll(text=re.compile("UX Designer"))
for i in k:
    profile_list.append(i.strip())
list(set(profile_list))
print profile_list

though it isn't as good as the two above. 虽然它不如上面两个那么好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM