[英]How can I solve an attribute error when using spacy?
I am using spacy for natural language processing in german.我在德语中使用 spacy 进行自然语言处理。 But I am running into this error:
但我遇到了这个错误:
AttributeError: 'str' object has no attribute 'text'
This is the text data I am working with:这是我正在使用的文本数据:
tex = ['Wir waren z.B. früher auf\'m Fahrrad unterwegs in München (immer nach 11 Uhr).',
'Nun fahren wir öfter mit der S-Bahn in München herum. Tja. So ist das eben.',
'So bleibt mir nichts anderes übrig als zu sagen, vielen Dank für alles.',
'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']
My code:我的代码:
data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]
nlp = spacy.load('de')
spacy_doc1 = []
for line in data2:
spac = nlp(line)
lem = [tok.lemma_ for tok in spac]
no_punct = [tok.text for tok in lem if re.match('\w+', tok.text)]
no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]
I am writing every string in a seperate list, because I need to assign the result of the processing to the original specific string.我将每个字符串都写在一个单独的列表中,因为我需要将处理结果分配给原始的特定字符串。
I also understand that the result that is written into lem
is not in a format anymore that spacy can process.我也明白写入
lem
的结果不再是 spacy 可以处理的格式。
So how can I do this correctly?那么我怎样才能正确地做到这一点呢?
The problem here lies in the fact that SpaCy's token.lemma_
returns a string, and that strings have no text
attribute (as the error states).这里的问题在于 SpaCy 的
token.lemma_
返回一个字符串,并且该字符串没有text
属性(如错误所述)。
I suggest doing the same as you did when you wrote:我建议你像你写的那样做:
no_numbers = [tok for tok in no_punct if not re.match('\\d+', tok)]
The only difference with this line in your code would be that you'd have to include the special string "-PRON-"
in case you encounter English pronouns:代码中这一行的唯一区别是您必须包含特殊字符串
"-PRON-"
,以防遇到英语代词:
import re
import spacy
# using the web English model for practicality here
nlp = spacy.load('en_core_web_sm')
tex = ['I\'m going to get a cat tomorrow',
'I don\'t know if I\'ll be able to get him a cat house though!']
data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]
spacy_doc1 = []
for line in data2:
spac = nlp(line)
lem = [tok.lemma_ for tok in spac]
no_punct = [tok for tok in lem if re.match('\w+', tok) or tok in ["-PRON-"]]
no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]
print(no_numbers)
# > ['-PRON-', 'be', 'go', 'to', 'get', 'a', 'cat', 'tomorrow']
# > ['-PRON-', 'do', 'not', 'know', 'if', '-PRON-', 'will', 'be', 'able', 'to', 'get', '-PRON-', 'a', 'cat', 'house', 'though']
Please tell me if this solved your problem as I may have misunderstood your issue.请告诉我这是否解决了您的问题,因为我可能误解了您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.