简体   繁体   English

Python-正则表达式关系提取

[英]Python - regex relation extraction

As a part of schoolwork we have been given this code: 作为功​​课的一部分,我们收到了以下代码:

>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
...     for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
...                corpus='ieer', pattern = IN):
...         print(nltk.sem.rtuple(rel))

We are asked to try it out with some sentences of our own to see the output, so for this i decided to define a function: 我们被要求用我们自己的一些句子尝试一下以查看输出,因此为此,我决定定义一个函数:

def extract(sentence):
    import re
    import nltk

    IN = re.compile(r'.*\bin\b(?!\b.+ing)')
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
        print(nltk.sem.rtuple(rel))

When I try and run this code: 当我尝试运行以下代码时:

>>> from extract import extract
>>> extract("The Whitehouse in Washington")

I get the gollowing error: 我收到以下错误:

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    extract("The Whitehouse in Washington")
  File "C:/Python34/My Scripts\extract.py", line 6, in extract
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
  File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
    pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'

Can anyone help me understand where I am going wrong in my function? 谁能帮助我了解我的功能出了哪些问题? The correct output for the test sentence should be: 测试语句的正确输出应为:

[ORG: 'Whitehouse'] 'in' [LOC: 'Washington']

If you see the method definition of extract_rels , it expects the parsed document as third argument. 如果您看到extract_rels的方法定义,则它期望将解析后的文档作为第三个参数。
And here you are passing the sentence. 在这里,您正在通过句子。 To overcome this error, you can do following : 要克服此错误,您可以执行以下操作:

tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
    pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
    doc.text = nltk.ne_chunk(sent)
    for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
        print(nltk.sem.rtuple(rel) )// you can change it according

Try it out..!!! 试试看..!!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM