简体   繁体   English

AWS Sagemaker T5 或 huggingface Model 培训问题

[英]AWS Sagemaker T5 or huggingface Model training issue

I am trying to train a t5 conditional Generation model in Sagemaker, its running fine when I am passing the arguments directly in notebook but its not learning anything when I am passing estimator and train.py script, I followed the documentation provided by hugging face as well as AWS.我正在尝试在 Sagemaker 中训练一个 t5 条件生成 model,当我在笔记本中直接传递 arguments 时它运行良好,但是当我传递估计器和 train.py 脚本时它没有学习任何东西,我按照拥抱面提供的文档作为以及AWS。 But still we are facing issue it is saying training is completed and saving model with in 663 seconds what ever might be the size of dataset.但我们仍然面临问题,它说训练已完成并在 663 秒内保存了 model,这可能是数据集的大小。 Kindly give suggestions for this.请为此提出建议。

Check Amazon CloudWatch logs to be able to tell what took place during training (train.py stdout/stderr).检查 Amazon CloudWatch 日志以了解训练期间发生了什么 (train.py stdout/stderr)。 This utility can help with downloading logs to your local machine/notebook.实用程序可以帮助将日志下载到本地机器/笔记本电脑。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM