简体   繁体   English

RASA NLU-我想在一个单词后面提取任何东西(单词,数字或特殊字符)作为一个实体

[英]RASA NLU- I want to extract anything(Words, numbers or special characters) as an entity after a word

Is there a way we can extract anything after a word as an entity; 有没有办法可以在单词作为实体之后提取任何内容; for eg: 例如:

I want to extract anything after about or go to or learn as an entity. 我想以后提取任何aboutgo tolearn作为一个实体。

##intent:navigate
-I want to learn about linear regression
-I want to read about SVM
-I want to go to Python 2.6
-Take me to logistic regression: eval

##regex:topic
-^[A-Za-z0-9 :_ -][A-Za-z0-9 :_ -][A-Za-z0-9 :_ -]$

Naive way could be very simple - to use split string method eg 天真的方式可能非常简单 - 使用分裂字符串方法,例如

sentences = ["I want to learn about linear regression", "I want to read about SVM", "I want to go to Python 2.6",
 "Take me to logistic regression: eval"]

split_terms = ["about", "go to", "learn"]

for sentence in sentences:
    for split_term in split_terms:
        try:
            print(sentence.split(split_term)[1])
        except IndexError:
            pass # split_term was not found in a sentence

Results: 结果:

 linear regression
 about linear regression
 SVM
 Python 2.6

A little bit smarter way could be to find first the very last "splitting term" to solve issue with learn - learn about - about 一点点聪明的方法可能是找到最后一个“分裂术语”来解决学习问题 - 了解 - 关于

for sentence in sentences:
    last_split_term_index = 0
    last_split_term = ""
    for split_term in split_terms:
        last_split_term_index_candidate = sentence.find(split_term)
        if last_split_term_index_candidate > last_split_term_index:
            last_split_term_index = last_split_term_index_candidate
            last_split_term = split_term
    try:
        print(sentence.split(last_split_term)[1])

    except:
        continue

Results: 结果:

 linear regression
 SVM
 Python 2.6

Yes you can, you will have to define the entities in your training data and it would be extracted by the model. 是的,您可以在训练数据中定义实体,并由模型提取。 For instance, in your example training data should be like. 例如,在您的示例中,训练数据应该是。

##intent:navigate
- I want to learn about [linear regression](topic)
- I want to talk about [RasaNLU](topic) for the rest of the day.
- I want to go to [Berlin](topic) for a specific work.
- I want to read about [SVM](topic)
- I want to go to [Python 2.6](topic)
- Take me to logistic regression: eval

After model training I tried an example 在模型训练后,我尝试了一个例子

Enter a message: I want to talk about SVM     
{
  "intent": {
    "name": "navigate",
    "confidence": 0.9576369524002075
  },
  "entities": [
    {
      "start": 21,
      "end": 24,
      "value": "SVM",
      "entity": "topic",
      "confidence": 0.8241770362411013,
      "extractor": "CRFEntityExtractor"
    }
  ]
}

But for this to be effective you will have to define more examples with all possible patterns. 但为了使其有效,您必须使用所有可能的模式定义更多示例。 Like the example 'I want to talk about RasaNLU for the rest of the day.' 就像一个例子'我想在剩下的时间里谈论RasaNLU。' suggests the model that the entity to be extracted doesn't have to be the last word of the sentence (which is the case in rest of the examples). 建议要提取的实体不必是句子的最后一个单词的模型(在其余示例中就是这种情况)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM