简体   繁体   English

我什么时候对 etl 使用胶水作业或 Sagemaker 处理作业?

[英]When do I use a glue job or a Sagemaker Processing job for an etl?

I am currently struggling to decide on what situations in which a glue job is preferable over a sagemaker processing job and vice versa?我目前正在努力决定在哪些情况下胶水作业比 sagemaker 加工作业更可取,反之亦然? Some advice on this topic would be greatly appreciated.对此主题的一些建议将不胜感激。

I can do the same on both, so why should I bother with the difference?我可以在两者上做同样的事情,那么我为什么要为不同而烦恼呢?

  • if you want to use a specific EC2 instance, use SageMaker如果您想使用特定的 EC2 实例,请使用 SageMaker
  • Pricing: SageMaker is pro-rated per-second while Glue has minimum charge amount (1min or 10min depending on versions).定价:SageMaker 按秒计费,而 Glue 具有最低收费量(1 分钟或 10 分钟,具体取决于版本)。 You should measure how much would a workload cost you on each platform您应该衡量每个平台上的工作负载成本
  • customization: in SageMaker Processing you can customize the execution environment, as you provide a Docker image (you could run more than Spark/Python, such as C++ or R)自定义:在 SageMaker 处理中,您可以自定义执行环境,因为您提供了 Docker 映像(您可以运行的不仅仅是 Spark/Python,例如 C++ 或 R)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM