简体   繁体   English

结构化流媒体OOM

[英]Structured Streaming OOM

I deploy a structured streaming job with on the k8s operator, which simply reads from kafka, deserializes, adds 2 columns and stores the results in the datalake (tried both delta and parquet) and after days the executor increases memory and eventually i get OOM.我在 k8s 运算符上部署了一个结构化的流作业,它只是从 kafka 读取、反序列化、添加 2 列并将结果存储在数据湖中(尝试了 delta 和 parquet),几天后,执行器增加了 memory,最终我得到了 OOM。 The input records are in terms of kbs really low.输入记录的 kbs 非常低。 Ps i use the exactly same code, but with cassandra as a sink which runs for almost a month now, without any issues. Ps 我使用完全相同的代码,但使用 cassandra 作为接收器,现在运行了将近一个月,没有任何问题。 any ideas plz?有什么想法吗?

enter image description here在此处输入图像描述

enter image description here在此处输入图像描述

My code我的代码

spark
    .readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", MetisStreamsConfig.bootstrapServers)
    .option("subscribe", MetisStreamsConfig.topics.head)
    .option("startingOffsets", startingOffsets)
    .option("maxOffsetsPerTrigger", MetisStreamsConfig.maxOffsetsPerTrigger)
    .load()
    .selectExpr("CAST(value AS STRING)")
    .as[String]
    .withColumn("payload", from_json($"value", schema))

    // selection + filtering
    .select("payload.*")
    .select($"vesselQuantity.qid" as "qid", $"vesselQuantity.vesselId" as "vessel_id", explode($"measurements"))
    .select($"qid", $"vessel_id", $"col.*")
    .filter($"timestamp".isNotNull)
    .filter($"qid".isNotNull and !($"qid"===""))
    .withColumn("ingestion_time", current_timestamp())
    .withColumn("mapping", MappingUDF($"qid"))
  writeStream
    .foreachBatch { (batchDF: DataFrame, batchId: Long) =>
      log.info(s"Storing batch with id: `$batchId`")
      val calendarInstance = Calendar.getInstance()

      val year = calendarInstance.get(Calendar.YEAR)
      val month = calendarInstance.get(Calendar.MONTH) + 1
      val day = calendarInstance.get(Calendar.DAY_OF_MONTH)
      batchDF.write
        .mode("append")
        .parquet(streamOutputDir + s"/$year/$month/$day")
    }
    .option("checkpointLocation", checkpointDir)
    .start()

i changed to foreachBatch because using delta or parquet with partitionBy cause issues faster我改为 foreachBatch 因为使用 delta 或 parquet 和 partitionBy 会导致问题更快

There is a bug that is resolved in Spark 3.1.0. Spark 3.1.0 中解决了一个错误。

See https://github.com/apache/spark/pull/28904https://github.com/apache/spark/pull/28904

Other ways of overcoming the issue & a credit for debugging:克服问题的其他方法和调试的功劳:

https://www.waitingforcode.com/apache-spark-structured-streaming/file-sink-out-of-memory-risk/read https://www.waitingforcode.com/apache-spark-structured-streaming/file-sink-out-of-memory-risk/read

You may find this helpful even though you are using foreachBatch...即使您正在使用 foreachBatch,您也会发现这很有帮助...

I had the same issue for some Structured Streaming Spark 2.4.4 applications writing some Delta lake (or parquet) output with partitionBy .对于一些使用partitionBy编写一些 Delta 湖(或镶木地板) output 的 Structured Streaming Spark 2.4.4 应用程序,我遇到了同样的问题。

Seem to be related to the jvm memory allocation within a container, as thorougly explained here: https://merikan.com/2019/04/jvm-in-a-container/似乎与容器内的 jvm memory 分配有关,正如此处彻底解释的那样: https://merikan.com/2019//4/j-vm--

My solution (but depends on your jvm version) was to add some option in the yaml definition for my spark application:我的解决方案(但取决于您的 jvm 版本)是在我的 spark 应用程序的yaml定义中添加一些选项:

spec:
    javaOptions: >-
        -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap

This way my Streamin App is functionning properly, with normal amount of memory (1GB for driver, 2GB for executors)这样我的 Streamin 应用程序运行正常,正常数量的 memory(驱动程序 1GB,执行程序 2GB)

EDIT: while it seem that the first issue is solved (controller killing pods for memory consumption) there is still an issue with slowly growing non-heap memory size;编辑:虽然第一个问题似乎已经解决(控制器杀死了 memory 消耗的 pod),但非堆 memory 大小缓慢增长仍然存在问题; after a few hours, the driver/executors are killed...几个小时后,司机/执行者被杀......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM