[英]Extract quotations using textacy
I am attempting to extract quotations and quotation attributions (ie, the speaker) from text, but I am getting errors.我正在尝试从文本中提取引用和引用属性(即说话者),但我遇到了错误。 Here is the setup:这是设置:
import textacy
import pandas as pd
import spacy
data = [
("\"Hello, nice to meet you,\" said world 1"),
("\"Hello, nice to meet you,\" said world 2"),
]
df = pd.DataFrame(data, columns=['text'])
nlp = spacy.load('en_core_web_sm')
doc = df['text'].apply(nlp)
Here is the desired output:这是所需的输出:
[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello,很高兴见到你,”)]
Here is the first attempt at extraction:这是提取的第一次尝试:
print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))
Which gives the following output:这给出了以下输出:
[<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>] [<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>]
Here is the second attempt at extraction:这是提取的第二次尝试:
print(list(textacy.extract.triples.direct_quotations(doc)))
Which gives the following error:这给出了以下错误:
AttributeError: 'Series' object has no attribute 'lang_' AttributeError:“系列”对象没有属性“lang_”
In your first attempt you were extracting quotations by iterating over the tokens.在您的第一次尝试中,您通过遍历标记来提取报价。
Here is an example of what you could do:这是您可以执行的操作的示例:
import textacy
import spacy
text =""" "Hello, nice to meet you," said world 1"""
nlp = spacy.load("en_core_web_sm")
doc = nlp(text)
print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.