简体繁体 English

是否应该从 Rasa NLU 训练数据中删除标点符号？

[英]Should punctuation be removed from Rasa NLU training data?

原文 2020-01-16 12:59:46 6 1 machine-learning/ nlp/ rasa-nlu/ punctuation/ rasa

在 NLU 训练数据中，意图的话语中的标点符号（逗号、撇号、问号、大写字母等）是否应该保持原样、删除，还是根本不重要？

1 个解决方案

The training data can be left with punctuation and the WhitespaceTokenizer ( documentation link ) will clean it up.训练数据可以留下标点符号， WhitespaceTokenizer （文档链接）会清理它。 Not all punctuations are cleaned up though!并非所有标点符号都被清理干净！ You can see the regex used in the tokenizer on Github .您可以在Github上查看标记器中使用的正则表达式。

So for your mentioned punctuation like commas, apostrophes, question marks, etc. you can leave it there and the tokenizer will handle it.因此，对于您提到的标点符号，如逗号、撇号、问号等，您可以将其留在那里，分词器会处理它。

Rasa NLU - 了解培训数据 - Rasa NLU - Understanding Training Data

RASA 槽提取 NLU 数据 - RASA slot extraction NLU data

Rasa NLU Trainer-如何修复“找不到训练文件”错误？ - Rasa NLU Trainer-How to fix “Can't find training file” error?

需要帮助来使用Docker设置Rasa NLU服务器 - Need help to setup Rasa NLU server with docker

重新训练和更新现有的Rasa NLU模型 - Retraining and updating an existing Rasa NLU model

rasa_nlu.model.UnsupportedModelError: The model version is to old to be loaded by this Rasa NLU instance - rasa_nlu.model.UnsupportedModelError: The model version is to old to be loaded by this Rasa NLU instance

我应该在再训练期间重新标准化训练数据吗？ - Should I restandardize training data during retraining?

在 Rasa 中训练或评估模型时，指标“it/s”的含义是什么？ - What is the meaning of the metric “it/s” when training or evaluating models in Rasa?

来自同一传感器的训练数据和测试数据 - Training data and testing data from the same sensor

机器学习的训练数据集的大小应该是多少？ - What should be the size of Training data set for machine-learning?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Rasa NLU - 了解培训数据 - Rasa NLU - Understanding Training Data RASA 槽提取 NLU 数据 - RASA slot extraction NLU data Rasa NLU Trainer-如何修复“找不到训练文件”错误？ - Rasa NLU Trainer-How to fix “Can't find training file” error? 需要帮助来使用Docker设置Rasa NLU服务器 - Need help to setup Rasa NLU server with docker 重新训练和更新现有的Rasa NLU模型 - Retraining and updating an existing Rasa NLU model rasa_nlu.model.UnsupportedModelError: The model version is to old to be loaded by this Rasa NLU instance - rasa_nlu.model.UnsupportedModelError: The model version is to old to be loaded by this Rasa NLU instance 我应该在再训练期间重新标准化训练数据吗？ - Should I restandardize training data during retraining? 在 Rasa 中训练或评估模型时，指标“it/s”的含义是什么？ - What is the meaning of the metric “it/s” when training or evaluating models in Rasa? 来自同一传感器的训练数据和测试数据 - Training data and testing data from the same sensor 机器学习的训练数据集的大小应该是多少？ - What should be the size of Training data set for machine-learning?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM