[英]'java.lang.OutOfMemoryError: Java heap space' error in spark application while trying to read the avro file and performing Actions
The avro size is around 44MB. avro 大小约为 44MB。
Below is the yarn logs error :以下是纱线日志错误:
20/03/30 06:55:04 INFO spark.ExecutorAllocationManager: Existing executor 18 has been removed (new total is 0)
20/03/30 06:55:04 INFO cluster.YarnClusterScheduler: Cancelling stage 5
20/03/30 06:55:04 INFO scheduler.DAGScheduler: ResultStage 5 (head at IrdsFIInstrumentEnricher.scala:15) failed in 213.391 s due to Job aborted due to stage f ailure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 134, fratlhadooappd30.de.db.com, executor 18): ExecutorLostFa ilure (executor 18 exited caused by one of the running tasks) Reason: Container marked as failed: container_1585337469684_0037_02_000029 on host: fratlhadooap pd30.de.db.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Driver stacktrace:
20/03/30 06:55:04 INFO scheduler.DAGScheduler: Job 3 failed: head at IrdsFIInstrumentEnricher.scala:15, took 213.427308 s
20/03/30 06:55:04 ERROR CCOIrdsEnrichmentService: Unexpected error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 13 4, fratlhadooappd30.de.db.com, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Container marked as failed: c ontainer_1585337469684_0037_02_000029 on host: fratlhadooappd30.de.db.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal
Driver stacktrace:
→ at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
.
.
.
.
.
.
.
.
20/03/30 06:48:19 INFO storage.DiskBlockManager: Shutdown hook called
20/03/30 06:48:19 INFO util.ShutdownHookManager: Shutdown hook called
LogType:stdout
Log Upload Time:Mon Mar 30 06:55:10 +0200 2020
LogLength:124
Log Contents:
java.lang.OutOfMemoryError: Java heap space
-XX:OnOutOfMemoryError="kill %p"
Executing /bin/sh -c "kill 62191"...
LogType:container-localizer-syslog
Log Upload Time:Mon Mar 30 06:55:10 +0200 2020
LogLength:0
Log Contents:
Below is the code I am using :以下是我正在使用的代码:
fiDF = spark.read
.format("com.databricks.spark.avro")
.load("C:\\Users\\kativikb\\Downloads\\Temp\\cco-irds\\rds_db_global_rds_fi-instrument_20200328000000_v1_block3_snapshot-inc.avro").limit(1)
val tempDF = fiDF.select("payload.identifier.id")
tempDF.show(10) // ******* Error at t his line ******
This was because the avro schema was too large, and I was using the spark version 2.1.0, which perhaps has bug for larger schemas.这是因为 avro 模式太大,而我使用的是 spark 版本 2.1.0,对于较大的模式可能存在错误。 this has been fixed in 2.4.0.
这已在 2.4.0 中修复。
I solved this error by changing the schema and using my custom schema, taking only the required fields in the schema.我通过更改架构并使用我的自定义架构解决了这个错误,只采用架构中的必填字段。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.