Python-正则表达式关系提取

Question

As a part of schoolwork we have been given this code: 作为功课的一部分，我们收到了以下代码：

>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
...     for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
...                corpus='ieer', pattern = IN):
...         print(nltk.sem.rtuple(rel))

We are asked to try it out with some sentences of our own to see the output, so for this i decided to define a function: 我们被要求用我们自己的一些句子尝试一下以查看输出，因此为此，我决定定义一个函数：

def extract(sentence):
    import re
    import nltk

    IN = re.compile(r'.*\bin\b(?!\b.+ing)')
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
        print(nltk.sem.rtuple(rel))

When I try and run this code: 当我尝试运行以下代码时：

>>> from extract import extract
>>> extract("The Whitehouse in Washington")

I get the gollowing error: 我收到以下错误：

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    extract("The Whitehouse in Washington")
  File "C:/Python34/My Scripts\extract.py", line 6, in extract
    for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
  File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
    pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'

Can anyone help me understand where I am going wrong in my function? 谁能帮助我了解我的功能出了哪些问题？ The correct output for the test sentence should be: 测试语句的正确输出应为：

[ORG: 'Whitehouse'] 'in' [LOC: 'Washington']

Answer 1

If you see the method definition of extract_rels , it expects the parsed document as third argument. 如果您看到extract_rels的方法定义，则它期望将解析后的文档作为第三个参数。
And here you are passing the sentence. 在这里，您正在通过句子。 To overcome this error, you can do following : 要克服此错误，您可以执行以下操作：

tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
    pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
    doc.text = nltk.ne_chunk(sent)
    for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
        print(nltk.sem.rtuple(rel) )// you can change it according

Try it out..!!! 试试看..！！！

Python-正则表达式关系提取

问题描述

1 个解决方案

解决方案1
0 2015-04-23 11:52:30

Python-正则表达式关系提取

问题描述

1 个解决方案

解决方案1 0 2015-04-23 11:52:30

解决方案1
0 2015-04-23 11:52:30