简体   繁体   English

使用 spacy 时如何解决属性错误?

[英]How can I solve an attribute error when using spacy?

I am using spacy for natural language processing in german.我在德语中使用 spacy 进行自然语言处理。 But I am running into this error:但我遇到了这个错误:

AttributeError: 'str' object has no attribute 'text'

This is the text data I am working with:这是我正在使用的文本数据:

tex = ['Wir waren z.B. früher auf\'m Fahrrad unterwegs in München (immer nach 11 Uhr).',
        'Nun fahren wir öfter mit der S-Bahn in München herum. Tja. So ist das eben.',
        'So bleibt mir nichts anderes übrig als zu sagen, vielen Dank für alles.',
        'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.']

My code:我的代码:

data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]

nlp = spacy.load('de')
spacy_doc1 = []
for line in data2:
    spac = nlp(line)
    lem = [tok.lemma_ for tok in spac]
    no_punct = [tok.text for tok in lem if re.match('\w+', tok.text)]
    no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]

I am writing every string in a seperate list, because I need to assign the result of the processing to the original specific string.我将每个字符串都写在一个单独的列表中,因为我需要将处理结果分配给原始的特定字符串。

I also understand that the result that is written into lem is not in a format anymore that spacy can process.我也明白写入lem的结果不再是 spacy 可以处理的格式。

So how can I do this correctly?那么我怎样才能正确地做到这一点呢?

The problem here lies in the fact that SpaCy's token.lemma_ returns a string, and that strings have no text attribute (as the error states).这里的问题在于 SpaCy 的token.lemma_返回一个字符串,并且该字符串没有text属性(如错误所述)。

I suggest doing the same as you did when you wrote:我建议你像你写的那样做:

no_numbers = [tok for tok in no_punct if not re.match('\\d+', tok)]

The only difference with this line in your code would be that you'd have to include the special string "-PRON-" in case you encounter English pronouns:代码中这一行的唯一区别是您必须包含特殊字符串"-PRON-" ,以防遇到英语代词:

import re
import spacy

# using the web English model for practicality here
nlp = spacy.load('en_core_web_sm')

tex = ['I\'m going to get a cat tomorrow',
        'I don\'t know if I\'ll be able to get him a cat house though!']

data = [re.sub(r"\"", "", i) for i in tex]
data1 = [re.sub(r"\“", "", i) for i in data]
data2 = [re.sub(r"\„", "", i) for i in data1]

spacy_doc1 = []

for line in data2:
    spac = nlp(line)
    lem = [tok.lemma_ for tok in spac]
    no_punct = [tok for tok in lem if re.match('\w+', tok) or tok in ["-PRON-"]]
    no_numbers = [tok for tok in no_punct if not re.match('\d+', tok)]
    print(no_numbers)

# > ['-PRON-', 'be', 'go', 'to', 'get', 'a', 'cat', 'tomorrow']
# > ['-PRON-', 'do', 'not', 'know', 'if', '-PRON-', 'will', 'be', 'able', 'to', 'get', '-PRON-', 'a', 'cat', 'house', 'though']

Please tell me if this solved your problem as I may have misunderstood your issue.请告诉我这是否解决了您的问题,因为我可能误解了您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Spacy 解决 'str' object 没有属性 'lemma_'? - How to solve 'str' object has no attribute 'lemma_' using Spacy? 使用beautifulsoup时如何解决属性错误? - How to solve an attribute error when using beautifulsoup? 如何解决 python 中的此属性错误? - how can i solve this attribute error in python? 使用DataFrameMapping时如何解决内存错误? - How can I solve a Memory Error when using DataFrameMapping? 打印图表报错“AttributeError: 'bool' object has no attribute 'items'”如何解决? - How can I solve the error "AttributeError: 'bool' object has no attribute 'items'" when I print graphs? 如何解决AttributeError:使用UDF时'RDD'对象没有属性'_get_object_id'? - How can I solve AttributeError: 'RDD' object has no attribute '_get_object_id' when using UDF? 如何解决 spacy latin 语言导入错误 - How to solve the spacy latin language import error 尝试导入 Spacy 时出现属性错误 - Attribute error when trying to import Spacy 使用MinGW-w64时如何解决cython中的“未定义引用错误”错误? - How can I solve "undefined reference error to" error in cython when using MinGW-w64? 我如何解决属性错误? - How do i solve Attribute Error?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM