[英]Python - regex relation extraction
As a part of schoolwork we have been given this code: 作为功课的一部分,我们收到了以下代码:
>>> IN = re.compile(r'.*\bin\b(?!\b.+ing)')
>>> for doc in nltk.corpus.ieer.parsed_docs('NYT_19980315'):
... for rel in nltk.sem.extract_rels('ORG', 'LOC', doc,
... corpus='ieer', pattern = IN):
... print(nltk.sem.rtuple(rel))
We are asked to try it out with some sentences of our own to see the output, so for this i decided to define a function: 我们被要求用我们自己的一些句子尝试一下以查看输出,因此为此,我决定定义一个函数:
def extract(sentence):
import re
import nltk
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
print(nltk.sem.rtuple(rel))
When I try and run this code: 当我尝试运行以下代码时:
>>> from extract import extract
>>> extract("The Whitehouse in Washington")
I get the gollowing error: 我收到以下错误:
Traceback (most recent call last):
File "<pyshell#1>", line 1, in <module>
extract("The Whitehouse in Washington")
File "C:/Python34/My Scripts\extract.py", line 6, in extract
for rel in nltk.sem.extract_rels('ORG', 'LOC', sentence, corpus='ieer', pattern = IN):
File "C:\Python34\lib\site-packages\nltk\sem\relextract.py", line 216, in extract_rels
pairs = tree2semi_rel(doc.text) + tree2semi_rel(doc.headline)
AttributeError: 'str' object has no attribute 'text'
Can anyone help me understand where I am going wrong in my function? 谁能帮助我了解我的功能出了哪些问题? The correct output for the test sentence should be:
测试语句的正确输出应为:
[ORG: 'Whitehouse'] 'in' [LOC: 'Washington']
If you see the method definition of extract_rels , it expects the parsed document as third argument. 如果您看到extract_rels的方法定义,则它期望将解析后的文档作为第三个参数。
And here you are passing the sentence. 在这里,您正在通过句子。 To overcome this error, you can do following :
要克服此错误,您可以执行以下操作:
tagged_sentences = [ nltk.pos_tag(token) for token in tokens]
class doc():
pass
IN = re.compile(r'.*\bin\b(?!\b.+ing)')
doc.headline=["test headline for sentence"]
for i,sent in enumerate(tagged_sentences):
doc.text = nltk.ne_chunk(sent)
for rel in nltk.sem.relextract.extract_rels('ORG', 'LOC', doc, corpus='ieer', pattern=IN):
print(nltk.sem.rtuple(rel) )// you can change it according
Try it out..!!! 试试看..!!!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.