简体   繁体   English

从itu告中提取死名实体-NLP

[英]Extracting Dead Name Entities from Obituaries - NLP

I have a continuous strings of ads , which are extracted from some newspaper. 我有一连串的广告,是从一些报纸上摘下来的。 The ads may appear in a format as shown below:My task here is to extract the deceased person's names. 广告的显示格式可能如下所示:我的任务是提取死者的姓名。

John, the small son of Mr. and Mrs.<br>
Elmer Cleppfer, died at their home in<br>
Lewistown on Wednesday. The funeral<br>
will He held on Saturday afternoon<br>
from the home of the grandparents<br>
on the child, Mr. and Mrs. John<br>
Kiopper, 224 Locust street, tortiorrow<br>
afternoon at 2 o'clock. Interment witt<br>
take place at Oberlin.<br>

Mrs. Lydia Mintch, aged 6S years <br>
died yesterday afternoon at the home<br>
of Fred Flowerfleld at Enhaut. Mrs.<br>
Mlnlch contracted a severe attack of<br>
pneumonia aggravated by other illness<br>
Several days ago which resulted in her<br>
death. Funeral arrangements have not<br>
yet been completed.<br>

The whole of the para is made up of 2 ads.. Can any one tell me how to classify such kind of text into paragraphs if there are more than 1 such ads? 整个段落由2个广告组成。如果有多个这样的广告,有人可以告诉我如何将这种文本分类为段落吗?

Well Stanford Parser is your option here. 那么斯坦福解析器是您的选择。

  1. First extract only the sentences that contain died, decreased or something on these terms. 首先仅提取包含死词,减少词或这些词的句子。
  2. Generate collapsed typed dependencies for these sentences using Stanford Parser. 使用Stanford Parser为这些句子生成折叠类型依赖。
  3. You will find a pattern which will help you get the name of the decreased person. 您会发现一个模式,可以帮助您获得被贬低者的名字。

I am intentionally not giving away the pattern here as you should put in your efforts as well. 我故意不放弃这里的模式,因为您也应该努力。

Here is how I would approach the problem. 这是我将如何处理该问题的方法。

  1. Get sentences POS tagged. 获取POS标签的句子。
  2. For each sentence, deep parse and create subject-verb-object model. 对于每个句子,进行深度解析并创建主语-动词-宾语模型。 (Left to right parse). (从左到右解析)。
  3. Where ever the verb points to death, the subject is the dead person. 只要动词指向死亡,主语就是死人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM