简体   繁体   English

如何从Python的句子中提取预定义的关键词?

[英]How to extract pre-defined key words from a sentence in Python?

Consider the following example "10% of on all Artificial Intelligence courses."考虑以下示例“所有人工智能课程的 10%”。 In this example, I have to extract two predefined classes like Artificial Intelligence and courses.在这个例子中,我必须提取两个预定义的类,比如人工智能和课程。 Even the program has to classify words like ANN, CNN, RNN, AI, etc. into the Artificial Intelligence category.甚至程序也必须将 ANN、CNN、RNN、AI 等词分类到人工智能类别中。 I have used spacy to train but I am not impressed with the results as it is not labeling correctly.我使用 spacy 进行训练,但我对结果印象不深,因为它没有正确标记。 Is there any alternative to extract entities from a sentence in Python?有没有其他方法可以从 Python 中的句子中提取实体?

Here are the few options that I would try out.以下是我会尝试的几个选项。

1.Custom entity extraction with Rasa. 1.使用Rasa进行自定义实体提取。

https://rasa.com/docs/rasa/nlu/entity-extraction/#custom-entities
  1. Bert based NER for Custom entities.自定义实体的基于 Bert 的 NER。 Take a look at the following repositories查看以下存储库
https://github.com/allenai/scibert
https://github.com/dmis-lab/biobert

You can use flashtext for doing this.您可以使用 flashtext 来执行此操作。

from flashtext import KeywordProcessor

kp = KeywordProcessor()

# make a dictionary and create key , insert all keyword in one key (i.e CNN, ANN RNN will come under artificial Intelligence, whenever this value will appear it will extract key for you ) 
dict_= {'Artificial Intelligence': ['ANN','CNN','RNN','AI','Artificial Intelligence'],'courses' : ['courses']} 

kp.add_keywords_from_dict(dict_)

# here Artificial Intelligence, ANN and CNN come under Artificial Intelligence key , that why it will extract the tag as Artificial Intelligence
kp.extract_keywords('10% of on all Artificial Intelligence, ANN, and CNN courses.')
#op
['Artificial Intelligence',
 'Artificial Intelligence',
 'Artificial Intelligence',
 'courses']

for more information you can follow the documentation of flashtext https://readthedocs.org/projects/flashtext/downloads/pdf/latest/有关更多信息,您可以关注 flashtext https 的文档://readthedocs.org/projects/flashtext/downloads/pdf/latest/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM