简体   繁体   English

使用Python在NLP中的命名实体识别中提取人名

[英]Extracting the person names in the named entity recognition in NLP using Python

I have a sentence for which i need to identify the Person names alone: 我有一句话,我需要单独识别人名:

For example: 例如:

sentence = "Larry Page is an American business magnate and computer scientist who is the co-founder of Google, alongside Sergey Brin"

I have used the below code to identify the NERs. 我使用下面的代码来识别NER。

from nltk import word_tokenize, pos_tag, ne_chunk
print(ne_chunk(pos_tag(word_tokenize(sentence))))

The output i received was: 我收到的输出是:

(S
  (PERSON Larry/NNP)
  (ORGANIZATION Page/NNP)
  is/VBZ
  an/DT
  (GPE American/JJ)
  business/NN
  magnate/NN
  and/CC
  computer/NN
  scientist/NN
  who/WP
  is/VBZ
  the/DT
  co-founder/NN
  of/IN
  (GPE Google/NNP)
  ,/,
  alongside/RB
  (PERSON Sergey/NNP Brin/NNP))

I want to extract all the person names, such as 我想提取所有人名,例如

Larry Page
Sergey Brin

In order to achieve this, I refereed this link and tried this. 为了达到这个目的,我对此链接进行了审核并尝试了这一点。

from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/usr/share/stanford-ner/classifiers/english.all.3class.distsim.crf.ser.gz','/usr/share/stanford-ner/stanford-ner.jar')

However i continue to get this error: 但是我继续得到这个错误:

LookupError: Could not find stanford-ner.jar jar file at /usr/share/stanford-ner/stanford-ner.jar

Where can i download this file? 我在哪里可以下载这个文件?

As informed above, the result that i am expecting in the form of list or dictionary is : 如上所述,我期望以列表或字典的形式出现的结果是:

Larry Page
Sergey Brin

In Long 在龙

Please read these carefully : 仔细阅读:

Understand the solution, don't just copy and paste. 了解解决方案,不要只是复制和粘贴。


TL;DR TL; DR

In terminal: 在终端:

pip install -U nltk

wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
unzip stanford-corenlp-full-2016-10-31.zip && cd stanford-corenlp-full-2016-10-31

java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer \
-preload tokenize,ssplit,pos,lemma,parse,depparse \
-status_port 9000 -port 9000 -timeout 15000

In Python 在Python中

from nltk.tag.stanford import CoreNLPNERTagger

def get_continuous_chunks(tagged_sent):
    continuous_chunk = []
    current_chunk = []

    for token, tag in tagged_sent:
        if tag != "O":
            current_chunk.append((token, tag))
        else:
            if current_chunk: # if the current chunk is not empty
                continuous_chunk.append(current_chunk)
                current_chunk = []
    # Flush the final current_chunk into the continuous_chunk, if any.
    if current_chunk:
        continuous_chunk.append(current_chunk)
    return continuous_chunk


stner = CoreNLPNERTagger()
tagged_sent = stner.tag('Rami Eid is studying at Stony Brook University in NY'.split())

named_entities = get_continuous_chunks(tagged_sent)
named_entities_str_tag = [(" ".join([token for token, tag in ne]), ne[0][1]) for ne in named_entities]


print(named_entities_str_tag)

[out]: [OUT]:

[('Rami Eid', 'PERSON'), ('Stony Brook University', 'ORGANIZATION'), ('NY', 'LOCATION')]

You might find this help too: Unpacking a list / tuple of pairs into two lists / tuples 您也可以找到这个帮助: 将列表/元组对打包成两个列表/元组

In the first place you need to download the jar files and the rest of the necessary files. 首先,您需要下载jar文件和其他必要文件。 Follow the link : https://gist.github.com/troyane/c9355a3103ea08679baf . 请点击链接: https//gist.github.com/troyane/c9355a3103ea08679baf Run the code to download the files(except the last few line). 运行代码下载文件(最后几行除外)。 Once done with the downloading part you are now ready to do the extraction part. 完成下载部分后,您现在可以进行提取部分了。

from nltk.tag.stanford import StanfordNERTagger
st = StanfordNERTagger('/home/saheli/Downloads/my_project/stanford-ner/english.all.3class.distsim.crf.ser.gz',
                   '/home/saheli/Downloads/my_project/stanford-ner/stanford-ner.jar')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM