简体   繁体   English

无法填充 AWS Glue ETL 作业指标

[英]Not able to populate AWS Glue ETL Job metrics

I am trying to populate maximum possible Glue job metrics for some testing, below is the setup I have created:我正在尝试为某些测试填充最大可能的 Glue 作业指标,以下是我创建的设置:

  • A crawler reads data (dummy customer data of 500 rows) from a CSV file placed in an S3 bucket.爬虫从放置在 S3 存储桶中的 CSV 文件中读取数据(500 行的虚拟客户数据)。
  • Used another crawler to crawl tables created in Redshift cluster.使用另一个爬虫来爬取在 Redshift 集群中创建的表。
  • An ETL job finally reads data from csv file in s3 and dumps it into a Redshift table. ETL 作业最终从 s3 中的 csv 文件中读取数据并将其转储到 Redshift 表中。

The job is running without any issue and i am able to see final data getting dumped into Redshift table, however, in the end, only below 5 Cloudwatch metrics are being populated:作业运行没有任何问题,我可以看到最终数据被转储到 Redshift 表中,但是,最后,只有低于 5 个 Cloudwatch 指标被填充:

  • glue.jvm.heap.usage胶水.jvm.heap.usage
  • glue.jvm.heap.used胶水.jvm.heap.used
  • glue.s3.filesystem.read_bytes胶水.s3.filesystem.read_bytes
  • glue.s3.filesystem.write_bytes胶水.s3.filesystem.write_bytes
  • glue.system.cpuSystemLoad胶水.system.cpuSystemLoad

There are approximately 20 more metrics which are not getting populated.还有大约 20 个指标没有被填充。

Any suggestions on how to populate those remaining metrics as well?关于如何填充这些剩余指标的任何建议?

Met the same issue.遇到了同样的问题。 Does your glue.s3.filesystem.read_bytes and glue.s3.filesystem.write_bytes have any data?你的glue.s3.filesystem.read_bytes 和glue.s3.filesystem.write_bytes 有数据吗?

One possible reason is that the AWS Glue job metrics not emitted if job completes in less then 30 sec一个可能的原因是,如果作业在 30 秒内完成,则不会发出 AWS Glue 作业指标

While running the job enable the metrics option under monitoring tab.在运行作业时启用监控选项卡下的指标选项。

Assuming that you are using Glue version 2.0 for the above job, please be advised that AWS Glue version 2.0 does not use dynamic allocation, hence the ExecutorAllocationManager metrics are not available.假设您使用 Glue 2.0 版进行上述作业,请注意 AWS Glue 2.0 版不使用动态分配,因此 ExecutorAllocationManager 指标不可用。 Trackback on using Glue 1.0 and you should confirm that all the documented metrics are now available.使用 Glue 1.0 的引用,您应该确认所有记录的指标现在都可用。


https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-limitations https://docs.aws.amazon.com/glue/latest/dg/reduced-start-times-spark-etl-jobs.html#reduced-start-times-limitations

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM