简体繁体 English

如何减少胶水etl作业（火花）实际开始执行所花费的时间？

[英]How to reduce the time taken by the glue etl job(spark) to actually start executing?

原文 2019-04-08 13:09:42 5 1 amazon-web-services/ apache-spark/ aws-glue

I want to start a glue etl job, though the execution is fair (time concerns), however, the time taken by glue to actually start executing the job is too much. 我想开始执行胶粘etl作业，尽管执行是公平的（与时间有关），但是，胶粘实际开始执行该作业所花费的时间太多。

I looked into various documentation and answers but none of them could give me the solution. 我研究了各种文档和答案，但是没有一个可以给我解决方案。 There was some explanation of this behavior: cold start but no solution. 对此行为有一些解释：冷启动但没有解决方案。

I expect to have the job up asap, it takes sometimes around 10 mins to start a job which gets executed in 2 mins. 我希望尽快完成工作，有时大约需要10分钟才能开始工作，但要在2分钟后执行。

1 个解决方案

Unfortunately it's not possible now. 不幸的是，现在不可能了。 Glue uses EMR under the hood and it requires some time to spin up a new cluster with desired number of executors. Glue在后台使用EMR，它需要一些时间来启动具有所需执行程序数量的新集群。 As far as I know they have a pool of spare EMR clusters with some most common DPU configurations so if you are lucky your job can get one and start immediately, otherwise it will wait. 据我所知，它们有一组备用的EMR群集，这些群集具有一些最常见的DPU配置，因此，如果您幸运的话，您的工作可以立即获得开始并立即开始，否则它将等待。

如何在 Glue Spark ETL 作业上倒带作业书签？ - How to rewind Job Bookmarks on Glue Spark ETL job?

AWS Glue：如何减少ETL作业的DPU数量 - AWS Glue: How to reduce the number of DPUs for an ETL job

如何在作业 AWS Glue 中获取开始和结束时间？ - How to get Start and End time in a Job AWS Glue?

AWS Glue ETL作业如何检索数据？ - How does AWS Glue ETL job retrieve data?

如何使用ODBC连接器连接胶水ETL / Spark中的关系数据库 - How to Connect Relational Database in Glue ETL / Spark using ODBC connector

在执行ETL作业之前是否需要运行AWS Glue搜寻器以检测新数据？ - Is it required to run AWS Glue crawler to detect new data before executing an ETL job?

无法填充 AWS Glue ETL 作业指标 - Not able to populate AWS Glue ETL Job metrics

使用日期作为变量为 ETL 参数化 AWS Glue 作业 - Parameterize AWS Glue Job for ETL with Date as variables

ETL：在AWS胶粘作业中展平嵌套数组 - ETL : Flatten a nested array in an AWS glue job

如何使用 Scala Spark 在 AWS Glue 作业中设置 Spark Config？ - How to set Spark Config in an AWS Glue job, using Scala Spark?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 Glue Spark ETL 作业上倒带作业书签？ - How to rewind Job Bookmarks on Glue Spark ETL job? AWS Glue：如何减少ETL作业的DPU数量 - AWS Glue: How to reduce the number of DPUs for an ETL job 如何在作业 AWS Glue 中获取开始和结束时间？ - How to get Start and End time in a Job AWS Glue? AWS Glue ETL作业如何检索数据？ - How does AWS Glue ETL job retrieve data? 如何使用ODBC连接器连接胶水ETL / Spark中的关系数据库 - How to Connect Relational Database in Glue ETL / Spark using ODBC connector 在执行ETL作业之前是否需要运行AWS Glue搜寻器以检测新数据？ - Is it required to run AWS Glue crawler to detect new data before executing an ETL job? 无法填充 AWS Glue ETL 作业指标 - Not able to populate AWS Glue ETL Job metrics 使用日期作为变量为 ETL 参数化 AWS Glue 作业 - Parameterize AWS Glue Job for ETL with Date as variables ETL：在AWS胶粘作业中展平嵌套数组 - ETL : Flatten a nested array in an AWS glue job 如何使用 Scala Spark 在 AWS Glue 作业中设置 Spark Config？ - How to set Spark Config in an AWS Glue job, using Scala Spark?

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM