简体   繁体   English

如何使用流数据训练 spacy 模型?

[英]How to train a spacy model by using streaming data?

I have created a spacy model.我创建了一个 spacy 模型。 But I need to retrain it until it reaches it maximum level.但我需要重新训练它,直到它达到最高水平。 I need to train this model and retrain the model using the streaming data.我需要训练这个模型并使用流数据重新训练模型。 I have seen that we can train some machine learning model using stream data.我已经看到我们可以使用流数据训练一些机器学习模型。 Is it possible to do the same to NLP models?是否可以对 NLP 模型做同样的事情?

You can write a custom corpus reader for your data ( https://spacy.io/api/top-level#corpus-readers ) and use the setting max_epochs = -1 to indicate that the data should be streamed:您可以为您的数据编写一个自定义语料库阅读器( https://spacy.io/api/top-level#corpus-readers )并使用设置max_epochs = -1来指示数据应该被流式传输:

[training]
max_epochs = -1

Depending on the type of component and the data, you may need to additionally initialize the labels for the component in the [initialize] block.根据组件的类型和数据,您可能需要在[initialize]块中额外初始化组件的标签。 If you're not streaming, the labels are automatically initialized from the full training corpus, but you can use spacy init labels to generate the labels based on a subset of the data instead and initialize them separately.如果您不是流式传输,则标签会自动从完整的训练语料库中初始化,但您可以使用spacy init labels来根据数据的子集生成标签并单独初始化它们。

More details: https://spacy.io/usage/v3-1#streaming-corpora更多细节: https ://spacy.io/usage/v3-1#streaming-corpora

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM