繁体   English   中英

如何通过 json object 直接从 python 在 rasa nlu 中训练

[英]how to pass json object directly to train in rasa nlu from python

我正在使用 rasa nlu 来训练数据。 根据http://nlu.rasa.ai/python.html中的文档,必须使用以下代码来训练文件 demo-rasa.Z466DEEC76ECDF5FCA6D38571F6324D54 中存在的数据

from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer

training_data = load_data('data/examples/rasa/demo-rasa.json')
trainer = Trainer(RasaNLUConfig("sample_configs/config_spacy.json"))
trainer.train(training_data)
model_directory = trainer.persist('./projects/default/')

但相反,我们如何从 json object 读取数据进行训练。

如果您查看load_data的实现,它会执行两个步骤:

  1. 猜测文件格式
  2. 使用适当的加载方法加载文件

最简单的解决方案是将您的 json 对象写入文件或 StringIO 对象。

或者,您可以选择您需要的特定加载函数,例如load_rasa_dataload_rasa_data分离文件读取。 对于这个例子,你可能只需要整个函数并删除行data = _read_json_from_file(filename)

我有点惊讶地看到目前没有办法读取已经加载的 json 对象。 如果您决定将这些功能调整为此,您可以考虑为其编写拉取请求。

我制作了一个 Flask 应用程序,它从请求正文中获取 JSON 对象,而不是从文件中读取它。

此代码使用 spaCy 转换现有的 LUIS json 实体并使用 sklearn-crfsuite 进行意图识别。

from flask import Flask, jsonify, request
from flask_cors import CORS
import json, os, msvcrt, psutil, subprocess, datetime

app = Flask(__name__)

CORS(app)

with app.app_context():
    with app.test_request_context():

        #region REST based RASA API
        serverExecutablePID = 0     
        hasAPIStarted = False
        configFileDirectory = "C:\\Code\\RasaAPI\\RASAResources\\config"
        chitChatModel = "ChitChat"
        assetsDirectory = "C:\\Code\\RasaAPI\\RASAResources"

        def createSchema(SchemaPath, dataToBeWritten):
            try:
                    #write LUIS or RASA JSON Schema in json file locking the file to avoid race condition using Python's Windows msvcrt binaries
                    with open(SchemaPath, "w") as SchemaCreationHandle:
                        msvcrt.locking(SchemaCreationHandle.fileno(), msvcrt.LK_LOCK, os.path.getsize(SchemaPath))
                        json.dump(dataToBeWritten, SchemaCreationHandle, indent = 4, sort_keys=False)
                        SchemaCreationHandle.close()

                    #Check if written file actually exists on disk or not
                    doesFileExist = os.path.exists(SchemaPath)                    
                    return doesFileExist

            except Exception as ex:
                return str(ex.args)


        def appendTimeStampToModel(ModelName):
            return ModelName + '_{:%Y%m%d-%H%M%S}.json'.format(datetime.datetime.now())

        def appendTimeStampToConfigSpacy(ModelName):
            return ModelName + '_config_spacy_{:%Y%m%d-%H%M%S}.json'.format(datetime.datetime.now())

        def createConfigSpacy(ModelName, DataPath, ConfigSpacyPath, TrainedModelsPath, LogDataPath):
            try:
                    with open(ConfigSpacyPath, "w") as configSpacyFileHandle:
                        msvcrt.locking(configSpacyFileHandle.fileno(), msvcrt.LK_LOCK, os.path.getsize(ConfigSpacyPath))
                        configDataToBeWritten = dict({
                        "project": ModelName,
                        "data": DataPath,
                        "path": TrainedModelsPath,
                        "response_log": LogDataPath,
                        "log_level": "INFO",
                        "max_training_processes": 1,
                        "pipeline": "spacy_sklearn",
                        "language": "en",
                        "emulate": "luis",
                        "cors_origins": ["*"],
                        "aws_endpoint_url": None,
                        "token": None,
                        "num_threads": 2,
                        "port": 5000
                        })
                        json.dump(configDataToBeWritten, configSpacyFileHandle, indent = 4, sort_keys=False)

                    return os.path.getsize(ConfigSpacyPath) > 0

            except Exception as ex:
                return str(ex.args)

        def TrainRASA(configFilePath):
            try:  
                trainingString = 'start /wait python -m rasa_nlu.train -c ' + '\"' + os.path.normpath(configFilePath) + '\"'
                returnCode = subprocess.call(trainingString, shell = True)
                return returnCode

            except Exception as ex:
                return str(ex.args)

        def StartRASAServer(configFileDirectory, ModelName):
            #region Server starting logic
            try:
                global hasAPIStarted
                global serverExecutablePID
                #1) for finding which is the most recent config_spacy
                root, dirs, files = next(os.walk(os.path.normpath(configFileDirectory)))

                configFiles = [configFile for configFile in files if ModelName in configFile]
                configFiles.sort(key = str.lower, reverse = True)
                mostRecentConfigSpacy = os.path.join(configFileDirectory, configFiles[0])

                serverStartingString = 'start /wait python -m rasa_nlu.server -c ' + '\"' + os.path.normpath(mostRecentConfigSpacy) + '\"'

                serverProcess = subprocess.Popen(serverStartingString, shell = True)
                serverExecutablePID = serverProcess.pid

                pingReturnCode = 1
                while(pingReturnCode):
                    pingReturnCode = os.system("netstat -na | findstr /i 5000")
                if(pingReturnCode == 0):
                    hasAPIStarted = True

                return pingReturnCode

            except Exception as ex:
                return jsonify({"message": "Failed because: " + str(ex.args) , "success": False})
            #endregion

        def KillProcessWindow(hasAPIStarted, serverExecutablePID):
            if(hasAPIStarted == True and serverExecutablePID != 0):
                me = psutil.Process(serverExecutablePID)
                for child in me.children():
                    child.kill()


        @app.route('/api/TrainRASA', methods = ['POST'])
        def TrainRASAServer():
            try:
                #get request body of POST request
                postedJSONData = json.loads(request.data, strict = False)

                if postedJSONData["data"] is not None:
                    print("Valid data")
                    #region JSON file building logic
                    modelName = postedJSONData["modelName"]
                    modelNameWithExtension = appendTimeStampToModel(modelName)
                    schemaPath = os.path.join(assetsDirectory, "data", modelNameWithExtension)
                    print(createSchema(schemaPath, postedJSONData["data"]))
                    #endregion

                    #region config file creation logic
                    configFilePath = os.path.join(assetsDirectory, "config", appendTimeStampToConfigSpacy(modelName))
                    logsDirectory = os.path.join(assetsDirectory, "logs")
                    trainedModelDirectory = os.path.join(assetsDirectory, "models")
                    configFileCreated = createConfigSpacy(modelName, schemaPath, configFilePath, trainedModelDirectory, logsDirectory)
                    #endregion

                    if(configFileCreated == True):
                        #region Training RASA NLU with schema
                        TrainingReturnCode = TrainRASA(configFilePath)
                        #endregion

                        if(TrainingReturnCode == 0):
                            return jsonify({"message": "Successfully trained RASA NLU with modelname: " + modelName, "success": True})
                            # KillProcessWindow(hasAPIStarted, serverExecutablePID)
                            # serverStartingReturnCode = StartRASAServer(configFileDirectory, modelName)
                            # #endregion

                            # if serverStartingReturnCode == 0:                    
                            #     return jsonify({"message": "Successfully started RASA server on port 5000", "success": True})

                            # elif serverStartingReturnCode is None:
                            #     return jsonify({"message": "Could not start RASA server, request timed out", "success": False})

                        else:
                            return jsonify({"message": "Soemthing wrong happened while training RASA NLU!", "success": False})

                    else:
                        return jsonify({"message": "Could not create config file for RASA NLU", "success": False})

                #throw exception if request body is empty
                return jsonify({"message": "Please enter some JSON, JSON seems to be empty", "success": False})

            except Exception as ex:
                return jsonify({"Reason": "Failed because" + str(ex.args), "success": False})

        @app.route('/api/StopRASAServer', methods = ['GET'])
        def StopRASAServer():
            try:
                global serverExecutablePID

                if(serverExecutablePID != 0 or serverExecutablePID != None):
                    me = psutil.Process(serverExecutablePID)
                    for child in me.children():
                        child.kill()
                    return jsonify({"message": "Server stopped....", "success": True})
            except Exception as ex:
                 return jsonify({"message": "Something went wrong while shutting down the server because: " + str(ex.args), "success": True})

        if __name__ == "__main__":
            StartRASAServer(configFileDirectory, chitChatModel)
            app.run(debug=False, threaded = True, host='0.0.0.0', port = 5050)

有一种简单的方法可以做到,但由于 RASA 的代码文档很差,很难找到。

您必须按以下格式创建一个 json。

training_data = {'rasa_nlu_data': {"common_examples": training_examples,
                                   "regex_features": [],
                                   "lookup_tables": [],
                                   "entity_synonyms": []
                                   }}

在这个 JSON training_examples 是一个列表,它应该包含如下所示的数据。

training_examples = [
    {
        "intent": "greet",
        "text": "Hello"
    },
    {
        "intent": "greet",
        "text": "Hi, how are you ?"
    },
    {
        "intent": "sad",
        "text": "I am not happy with the service"
    },
    {
        "intent": "praise",
        "text": "You're a genius"
    }
]

现在有了这个,你可以像这样训练它:)

from rasa.nlu import config

# Even config can also be loaded from dict like this    
def get_train_config():
    return {'language': 'en',
            'pipeline': [
                {'name': 'WhitespaceTokenizer'},
                {'name': 'ConveRTFeaturizer'},
                {'name': 'EmbeddingIntentClassifier'}
                ],
            'data': None,
            'policies': [
                {'name': 'MemoizationPolicy'},
                {'name': 'KerasPolicy'},
                {'name': 'MappingPolicy'}
                ]
            }

trainer = Trainer(config._load_from_dict(get_train_config()))
interpreter = trainer.train(data)

如何解决错误“异常:找不到组件 class for ' <spacy.lang.pt.portuguese object" in rasa_nlu python?< div><div id="text_translate"><p> 我使用 rasa_nlu package 在 Jupyter Notebook 中启动葡萄牙语聊天机器人,运行代码时收到以下错误消息:</p><pre> Exception: Failed to find component class for '&lt;spacy.lang.pt.Portuguese object at 0x000001D94243BE48&gt;'. Unknown component name. Check your configured pipeline and make sure the mentioned component is not misspelled. If you are creating your own component, make sure it is either listed as part of the `component_classes` in `rasa_nlu.registry.py` or is a proper name of a class in a module.</pre><p> 这是我的代码。 我不确定问题的原因,但我认为这可能与葡萄牙语的使用有关:</p><pre> from rasa_nlu.converters import load_data from rasa_nlu.config import RasaNLUConfig from rasa_nlu.model import Trainer #Create args dictionary import spacy nlp = spacy.load('pt_core_news_sm') spacy_sklearn_pipeline = [ nlp, "ner_crf", "ner_synonyms", "intent_featurizer_spacy", "intent_classifier_sklearn" ] args = {"pipeline": spacy_sklearn_pipeline} config = RasaNLUConfig(cmdline_args = args) trainer = Trainer(config)</pre><p> 在上面的行之后,出现错误:</p><pre> TypeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\rasa_nlu\registry.py in get_component_class(component_name) 126 try: --&gt; 127 return utils.class_from_module_path(component_name) 128 except Exception: ~\anaconda3\lib\site-packages\rasa_nlu\utils\__init__.py in class_from_module_path(module_path) 121 # load the module, will raise ImportError if module cannot be loaded --&gt; 122 if "." in module_path: 123 module_name, _, class_name = module_path.rpartition('.') TypeError: argument of type 'Portuguese' is not iterable During handling of the above exception, another exception occurred: Exception Traceback (most recent call last) &lt;ipython-input-14-eee0bfaf3435&gt; in &lt;module&gt; ----&gt; 1 trainer = Trainer(config) ~\anaconda3\lib\site-packages\rasa_nlu\model.py in __init__(self, config, component_builder, skip_validation) 124 # required packages are available 125 if not self.skip_validation: --&gt; 126 components.validate_requirements(config.pipeline) 127 128 # Transform the passed names of the pipeline components into classes ~\anaconda3\lib\site-packages\rasa_nlu\components.py in validate_requirements(component_names) 54 failed_imports = set() 55 for component_name in component_names: ---&gt; 56 component_class = registry.get_component_class(component_name) 57 failed_imports.update(find_unavailable_packages( 58 component_class.required_packages())) ~\anaconda3\lib\site-packages\rasa_nlu\registry.py in get_component_class(component_name) 134 "listed as part of the `component_classes` in " 135 "`rasa_nlu.registry.py` or is a proper name of a class " --&gt; 136 "in a module.".format(component_name)) 137 return registered_components[component_name] 138 Exception: Failed to find component class for '&lt;spacy.lang.pt.Portuguese object at 0x000001D94243BE48&gt;'. Unknown component name. Check your configured pipeline and make sure the mentioned component is not misspelled. If you are creating your own component, make sure it is either listed as part of the `component_classes` in `rasa_nlu.registry.py` or is a proper name of a class in a module.</pre></div></spacy.lang.pt.portuguese>

[英]How to solve the error "Exception: Failed to find component class for '<spacy.lang.pt.Portuguese object" in rasa_nlu in python?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Rasa NLU 在线和训练版问题 rasa_nlu从多个json文件加载数据 如何在RASA CORE中使用RASA NLU 如何解决错误“异常:找不到组件 class for ' <spacy.lang.pt.portuguese object" in rasa_nlu python?< div><div id="text_translate"><p> 我使用 rasa_nlu package 在 Jupyter Notebook 中启动葡萄牙语聊天机器人,运行代码时收到以下错误消息:</p><pre> Exception: Failed to find component class for '&lt;spacy.lang.pt.Portuguese object at 0x000001D94243BE48&gt;'. Unknown component name. Check your configured pipeline and make sure the mentioned component is not misspelled. If you are creating your own component, make sure it is either listed as part of the `component_classes` in `rasa_nlu.registry.py` or is a proper name of a class in a module.</pre><p> 这是我的代码。 我不确定问题的原因,但我认为这可能与葡萄牙语的使用有关:</p><pre> from rasa_nlu.converters import load_data from rasa_nlu.config import RasaNLUConfig from rasa_nlu.model import Trainer #Create args dictionary import spacy nlp = spacy.load('pt_core_news_sm') spacy_sklearn_pipeline = [ nlp, "ner_crf", "ner_synonyms", "intent_featurizer_spacy", "intent_classifier_sklearn" ] args = {"pipeline": spacy_sklearn_pipeline} config = RasaNLUConfig(cmdline_args = args) trainer = Trainer(config)</pre><p> 在上面的行之后,出现错误:</p><pre> TypeError Traceback (most recent call last) ~\anaconda3\lib\site-packages\rasa_nlu\registry.py in get_component_class(component_name) 126 try: --&gt; 127 return utils.class_from_module_path(component_name) 128 except Exception: ~\anaconda3\lib\site-packages\rasa_nlu\utils\__init__.py in class_from_module_path(module_path) 121 # load the module, will raise ImportError if module cannot be loaded --&gt; 122 if "." in module_path: 123 module_name, _, class_name = module_path.rpartition('.') TypeError: argument of type 'Portuguese' is not iterable During handling of the above exception, another exception occurred: Exception Traceback (most recent call last) &lt;ipython-input-14-eee0bfaf3435&gt; in &lt;module&gt; ----&gt; 1 trainer = Trainer(config) ~\anaconda3\lib\site-packages\rasa_nlu\model.py in __init__(self, config, component_builder, skip_validation) 124 # required packages are available 125 if not self.skip_validation: --&gt; 126 components.validate_requirements(config.pipeline) 127 128 # Transform the passed names of the pipeline components into classes ~\anaconda3\lib\site-packages\rasa_nlu\components.py in validate_requirements(component_names) 54 failed_imports = set() 55 for component_name in component_names: ---&gt; 56 component_class = registry.get_component_class(component_name) 57 failed_imports.update(find_unavailable_packages( 58 component_class.required_packages())) ~\anaconda3\lib\site-packages\rasa_nlu\registry.py in get_component_class(component_name) 134 "listed as part of the `component_classes` in " 135 "`rasa_nlu.registry.py` or is a proper name of a class " --&gt; 136 "in a module.".format(component_name)) 137 return registered_components[component_name] 138 Exception: Failed to find component class for '&lt;spacy.lang.pt.Portuguese object at 0x000001D94243BE48&gt;'. Unknown component name. Check your configured pipeline and make sure the mentioned component is not misspelled. If you are creating your own component, make sure it is either listed as part of the `component_classes` in `rasa_nlu.registry.py` or is a proper name of a class in a module.</pre></div></spacy.lang.pt.portuguese> 无法使用Rasa NLU / Sklearn训练新模型 无法预测在python中使用rasa_nlu 从多个模型解析Rasa NLU 如何使用 Anaconda 提示符安装 Rasa NLU 如何在Windows上为RASA NLU安装MITIE NLP? 在通过rasa-nlu模型对意图进行分类之前,如何(编辑或处理)来自用户的消息?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM