[英]how can i avoid OOMs error in AWS Glue Job in pyspark
I am getting this error while running AWS Glue job using 40 workers and processing 40GB data我在使用 40 个工作人员运行 AWS Glue 作业并处理 40GB 数据时遇到此错误
Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5fa14240 : No space left on device
How can i optimize my job to avoid such error on pyspark我如何优化我的工作以避免在 pyspark 上出现此类错误
Here is the pic of metrics glue_metrics这是指标glue_metrics的图片
AWS Glue Spark shuffle manager with Amazon S3使用 Amazon S3 的 AWS Glue Spark 随机播放管理器
Requires using Glue 2.0需要使用 Glue 2.0
See the following links.请参阅以下链接。
https://awscloudfeed.com/whats-new/big-data/introducing-amazon-s3-shuffle-in-aws-glue https://awscloudfeed.com/whats-new/big-data/introducing-amazon-s3-shuffle-in-aws-glue
https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-shuffle-manager.html https://docs.aws.amazon.com/glue/latest/dg/monitor-spark-shuffle-manager.html
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.