简体   繁体   English

使用 textacy 提取引用

[英]Extract quotations using textacy

I am attempting to extract quotations and quotation attributions (ie, the speaker) from text, but I am getting errors.我正在尝试从文本中提取引用和引用属性(即说话者),但我遇到了错误。 Here is the setup:这是设置:

import textacy
import pandas as pd
import spacy

data = [
        ("\"Hello, nice to meet you,\" said world 1"),
        ("\"Hello, nice to meet you,\" said world 2"),  
        ]

df = pd.DataFrame(data, columns=['text'])

nlp = spacy.load('en_core_web_sm')

doc = df['text'].apply(nlp)

Here is the desired output:这是所需的输出:

[DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 1], cue=[said], content="Hello, nice to meet you,")] [DQTriple(speaker=[world 2], cue=[said], content="Hello,很高兴见到你,”)]

Here is the first attempt at extraction:这是提取的第一次尝试:

print(list(textacy.extract.triples.direct_quotations(doc) for records in doc))

Which gives the following output:这给出了以下输出:

[<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>] [<generator object direct_quotations at 0x7f82edf58ac0>, <generator object direct_quotations at 0x7f82edf58190>]

Here is the second attempt at extraction:这是提取的第二次尝试:

print(list(textacy.extract.triples.direct_quotations(doc)))

Which gives the following error:这给出了以下错误:

AttributeError: 'Series' object has no attribute 'lang_' AttributeError:“系列”对象没有属性“lang_”

In your first attempt you were extracting quotations by iterating over the tokens.在您的第一次尝试中,您通过遍历标记来提取报价。

Here is an example of what you could do:这是您可以执行的操作的示例:

import textacy

import spacy

text =""" "Hello, nice to meet you," said world 1"""

nlp = spacy.load("en_core_web_sm")

doc = nlp(text)

print(list(textacy.extract.triples.direct_quotations(doc)))
# will print: [DQTriple(speaker=[world], cue=[said], content="Hello, nice to meet you,")]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM