简体   繁体   English

'java.lang.OutOfMemoryError: Java heap space' 在尝试读取 avro 文件并执行操作时 Spark 应用程序中出现错误

[英]'java.lang.OutOfMemoryError: Java heap space' error in spark application while trying to read the avro file and performing Actions

The avro size is around 44MB. avro 大小约为 44MB。

Below is the yarn logs error :以下是纱线日志错误:

20/03/30 06:55:04 INFO spark.ExecutorAllocationManager: Existing executor 18 has been removed (new total is 0)
20/03/30 06:55:04 INFO cluster.YarnClusterScheduler: Cancelling stage 5
20/03/30 06:55:04 INFO scheduler.DAGScheduler: ResultStage 5 (head at IrdsFIInstrumentEnricher.scala:15) failed in 213.391 s due to Job aborted due to stage f        ailure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 134, fratlhadooappd30.de.db.com, executor 18): ExecutorLostFa        ilure (executor 18 exited caused by one of the running tasks) Reason: Container marked as failed: container_1585337469684_0037_02_000029 on host: fratlhadooap        pd30.de.db.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

Driver stacktrace:
20/03/30 06:55:04 INFO scheduler.DAGScheduler: Job 3 failed: head at IrdsFIInstrumentEnricher.scala:15, took 213.427308 s
20/03/30 06:55:04 ERROR CCOIrdsEnrichmentService: Unexpected error
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 13        4, fratlhadooappd30.de.db.com, executor 18): ExecutorLostFailure (executor 18 exited caused by one of the running tasks) Reason: Container marked as failed: c        ontainer_1585337469684_0037_02_000029 on host: fratlhadooappd30.de.db.com. Exit status: 143. Diagnostics: Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

Driver stacktrace:
→ at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
.
.
.
.
.
.
.
.
20/03/30 06:48:19 INFO storage.DiskBlockManager: Shutdown hook called
20/03/30 06:48:19 INFO util.ShutdownHookManager: Shutdown hook called

LogType:stdout
Log Upload Time:Mon Mar 30 06:55:10 +0200 2020
LogLength:124
Log Contents:

 java.lang.OutOfMemoryError: Java heap space
 -XX:OnOutOfMemoryError="kill %p"
   Executing /bin/sh -c "kill 62191"...

LogType:container-localizer-syslog
Log Upload Time:Mon Mar 30 06:55:10 +0200 2020
LogLength:0
Log Contents:

Below is the code I am using :以下是我正在使用的代码:

fiDF = spark.read
  .format("com.databricks.spark.avro")
  .load("C:\\Users\\kativikb\\Downloads\\Temp\\cco-irds\\rds_db_global_rds_fi-instrument_20200328000000_v1_block3_snapshot-inc.avro").limit(1)

val tempDF = fiDF.select("payload.identifier.id")
tempDF.show(10) // ******* Error at t his line ******

This was because the avro schema was too large, and I was using the spark version 2.1.0, which perhaps has bug for larger schemas.这是因为 avro 模式太大,而我使用的是 spark 版本 2.1.0,对于较大的模式可能存在错误。 this has been fixed in 2.4.0.这已在 2.4.0 中修复。

I solved this error by changing the schema and using my custom schema, taking only the required fields in the schema.我通过更改架构并使用我的自定义架构解决了这个错误,只采用架构中的必填字段。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 java.lang.OutOfMemoryError:spark应用程序中的Java堆空间 - java.lang.OutOfMemoryError: Java heap space in spark application Spark Cassandra聚合java.lang.OutOfMemoryError:Java堆空间 - Spark Cassandra Aggregation java.lang.OutOfMemoryError: Java heap space Spark Scala 代码中的“线程“dispatcher-event-loop-0”中的异常 java.lang.OutOfMemoryError: Java heap space '错误 - 'Exception in thread "dispatcher-event-loop-0" java.lang.OutOfMemoryError: Java heap space ' error in Spark Scala code StringBuilder - java.lang.OutOfMemoryError: Java 堆空间 - StringBuilder - java.lang.OutOfMemoryError: Java heap space 使用 Wholetextfile 方法读取 5-6 GB 文本文件时,火花提交 java.lang.OutOfMemoryError 出错 - Error in spark submit java.lang.OutOfMemoryError while reading 5-6 GB of text file using wholetextfile method 避免在不增加堆空间的情况下避免java.lang.OutOfMemoryError异常? - Avoid the exception java.lang.OutOfMemoryError without growing the heap space? Spark Graphx java.lang.OutOfMemoryError - Spark Graphx java.lang.OutOfMemoryError 使用置换的原因:线程“ main”中的异常java.lang.OutOfMemoryError:Java堆空间 - Using permutations causes: Exception in thread “main” java.lang.OutOfMemoryError: Java heap space Spark:对Parquet的读写导致OutOfMemoryError:Java堆空间 - Spark: Read and Write to Parquet leads to OutOfMemoryError: Java heap space OutOfMemoryError:Spark中的Java堆空间和内存变量 - OutOfMemoryError: Java heap space and memory variables in Spark
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM