简体   繁体   English

从列表中删除重复项时遇到问题

[英]I have a problem removing duplicates from a list

Having a brain-fart here probably.可能在这里放个屁。 I'm getting a list using generators and I have trouble removing duplicates from my list the usual way using set我正在使用生成器获取列表,但我无法以通常的方式使用set从列表中删除重复项

import spacy
import textacy

nlp = spacy.load("en_core_web_lg")
text = ('''The Western compulsion to hedonism has made us lust for money just to show that we have it. Possessions do not make a man—man makes possessions. The Western compulsion to hedonism has made us lust for money just to show that we have it. Possessions do not make a man—man makes possessions.''')
doc = nlp(text)

keywords = list(textacy.extract.ngrams(doc, 1, filter_stops=True, filter_punct=True, filter_nums=False)) + list(textacy.extract.ngrams(doc, 2, filter_stops=True, filter_punct=True, filter_nums=False))

print(list(set(keywords)))

the result contains duplicates:结果包含重复项:

[man, lust, makes possessions, man, Possessions, makes possessions, man makes, hedonism, man, money, compulsion, Western compulsion, man, possessions, man makes, compulsion, Possessions, Western compulsion, possessions, Western, makes, makes, lust, hedonism, Western, money]

That's because the the items in your list aren't strings, so they aren't actually duplicates.那是因为列表中的项目不是字符串,所以它们实际上不是重复的。

>>> type(keywords[0])
spacy.tokens.span.Span

To only get the duplicate words, you can use a dictionary comprehension with their string representations as the unique keys, and then just take out the spacy objects with .values() :要仅获取重复的单词,您可以使用字典推导及其字符串表示作为唯一键,然后使用.values()取出 spacy 对象:

uniques = list({keyword.__repr__(): keyword for keyword in keywords}.values())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM