在Python中使用'in'关键字在集合中查找项目时遇到麻烦

Question

I have this code here. 我在这里有这段代码。

import spacy
nlp = spacy.load('en')
a = set(nlp('This is a test'))
b = nlp('is')
if b in a:
  print("Success")
else:
  print("Failed")

for some reason this output printed out "Failed". 由于某种原因，此输出打印出“失败”。 I expected it to succeed. 我希望它能成功。 I am new in using the spacy framework so I'm not quite sure how to do this right. 我是使用spacy框架的新手，所以我不太确定如何正确执行此操作。 How do I do this right? 我该怎么做对？

Answer 1

The type(b) is a <class 'spacy.tokens.doc.Doc'> and you are comparing with a variable that is a set <class 'set'> . type(b)是<class 'spacy.tokens.doc.Doc'> ，您正在将变量与设置为<class 'set'> <class 'spacy.tokens.doc.Doc'>的变量进行比较。 So try converting both the variables to set and then try the in method. 因此，请尝试将两个变量都转换为set，然后尝试in方法。 And each item in the nlp tokens is a <class 'spacy.tokens.token.Token'> class rather than a string. 而且nlp令牌中的每个项目都是一个<class 'spacy.tokens.token.Token'>类，而不是字符串。 So you have to convert them to compatible types before trying to use the in operator. 因此，在尝试使用in运算符之前，必须将它们转换为兼容的类型。

a = set(nlp('This is a test'))
a = {str(token) for token in a} # convert all token type to str

b = nlp('is')
b = str(set(b).pop()) # convert token to str, effectively same as b = 'is'
if b in a:
  print("Success")
else:
  print("Failed")

Answer 2

I don't think you can rely on the hash of the token for the set operation You can dig in and look at the .text attribute 我认为您不能依靠令牌的哈希来进行设置操作。您可以深入研究.text属性。

import spacy
nlp = spacy.load('en')
a = set(x.text for x in nlp('This is a test'))
b = nlp('is').text
if b in a:
  print("Success")
else:
  print("Failed")

proof... 证明...

>>> import spacy
>>> nlp = spacy.load('en')
>>> a = set(x.text for x in nlp('This is a test'))
>>> b = nlp('is').text
>>> if b in a:
...   print("Success")
... else:
...   print("Failed")
... 
Success

Answer 3

@bboyjacks : Thanks for high-lightening this interesting question . @bboyjacks：非常感谢您提出这个有趣的问题 。

I just want to let you know that it doesn't specifically related to spaCy framework, it's more related to python concepts. 我只想让您知道，它与spaCy框架并不特别相关，而与python概念有关。

Answer provided by @John La Rooy above is correct but I put my version as you have asked the same in spaCy community as well (this may add some clarity to the solution). 上面@John La Rooy提供的答案是正确的，但我也将我的版本与spaCy社区中的要求相同（这可以使解决方案更加清晰）。

Please check my answer below: 请在下面检查我的答案：

print(a) # prints -> {This, test, is}
print(b) # prints -> is

So it seems like 'in' operator should work but the catch is below: 因此，看起来“ in”运算符应该起作用，但要注意的是：

print(type(a))          # prints -> <class 'set'>
print(type(a.pop()))    # prints -> <class 'spacy.tokens.token.Token'>
print(type(b))          # prints -> <class 'spacy.tokens.doc.Doc'>

Object with type [spacy.tokens.doc.Doc] == Object with type [spacy.tokens.token.Token] will always returns ' False ' 与在Object [spacy.tokens.doc.Doc] ==了Object类型[spacy.tokens.token.Token]总是返回' 假 '

We need to somehow convert them into same type and since we are not sure about the equal method defined in Token or Doc classes provided by spaCy, simply convert both into str class. 我们需要以某种方式将它们转换为相同的类型，并且由于我们不确定spaCy提供的Token或Doc类中定义的equal方法，因此只需将它们都转换为str类即可。

This conversion can be done as above shown by @John La Rooy or you can also try below complete running code. 这种转换可以按照@John La Rooy所示的方法完成，或者您也可以尝试下面完整的运行代码。

import spacy
nlp = spacy.load('en')
a = set(nlp('This is a test'))
b = nlp('is')

if b.text in map(lambda token: token.text, a):
  print("Success")
else:
  print("Failed")

Feel free to comment for any further clarification, my responses might be have some delay but I will try my best to reply. 如有任何进一步的澄清，请随时发表评论，我的回复可能会有所延迟，但我会尽力答复。

在Python中使用'in'关键字在集合中查找项目时遇到麻烦

问题描述

3 个解决方案

解决方案1
2 2018-10-31 04:16:10

解决方案2
0 2018-10-31 04:30:18

解决方案3
0 2018-11-02 04:45:35

在Python中使用&#39;in&#39;关键字在集合中查找项目时遇到麻烦

问题描述

3 个解决方案

解决方案1 2 2018-10-31 04:16:10

解决方案2 0 2018-10-31 04:30:18

解决方案3 0 2018-11-02 04:45:35

在Python中使用'in'关键字在集合中查找项目时遇到麻烦

解决方案1
2 2018-10-31 04:16:10

解决方案2
0 2018-10-31 04:30:18

解决方案3
0 2018-11-02 04:45:35