简体繁体 English

如何调查火花中发生的 kryo 缓冲区溢出？

[英]How to investigate a kryo buffer overflow happening in spark?

原文 2022-05-18 16:01:57 4 1 apache-spark/ optimization/ kryo

The bounty expires in 7 days . 赏金将在 7 天后到期。 Answers to this question are eligible for a +50 reputation bounty. 此问题的答案有资格获得+50声望赏金。 Omkar Neogi wants to draw more attention to this question: Omkar Neogi希望引起更多关注这个问题：

Please share how one might go about investigating what object / set of objects required the most memory and how to make an informed decision about setting the kryo buffer size using an example. 请分享如何使用示例来调查哪些对象/对象集需要最多内存，以及如何就设置 kryo 缓冲区大小做出明智的决定。 Thank you. 谢谢你。

I encountered a kryo buffer overflow exception, but I really don't understand what data could require more than the current buffer size.我遇到了 kryo 缓冲区溢出异常，但我真的不明白哪些数据可能需要超过当前缓冲区大小。 I already have spark.kryoserializer.buffer.max set to 256Mb, and even a toString applied on the dataset items, which should be much bigger than what kryo requires, take less than that (per item).我已经将spark.kryoserializer.buffer.max设置为 256Mb，甚至在数据集项目上应用了一个 toString，它应该比 kryo 需要的要大得多，占用的空间小于（每个项目）。

I know I can increase this parameter, and I will right now, but I don't think this is a good practice to simply increase resources when reaching a bound without investigating what happens (same as if I get an OOM and simply increase ram allocation without checking what takes more ram)我知道我可以增加这个参数，我现在会增加，但我不认为这是一个好习惯，在达到界限时简单地增加资源而不调查会发生什么（就像我得到一个 OOM 并简单地增加 ram 分配不检查什么需要更多的内存）

=> So, is there a way to investigate what is put in the buffer along the spark dag execution? => 那么，有没有办法调查在 spark dag 执行过程中放入缓冲区的内容？

I couldn't find anything in the spark ui.我在火花 ui 中找不到任何东西。

Note that How Kryo serializer allocates buffer in Spark is not the same question.请注意， Kryo 序列化程序如何在 Spark 中分配缓冲区不是同一个问题。 It ask how it works (and actually no one answers it), and I ask how to investigate.它问它是如何工作的（实际上没有人回答它），我问如何调查。 In the above question, all answers discuss the parameters to use, I know which param to use and I do manage to avoid the exception by increasing the parameters.在上述问题中，所有答案都讨论了要使用的参数，我知道要使用哪个参数，并且我确实通过增加参数来避免异常。 However, I already consume too much ram, and need to optimize it, kryo buffer included.但是，我已经消耗了太多的 ram，需要对其进行优化，包括 kryo 缓冲区。

1 个解决方案

All data that is sent over the network or written to the disk or persisted in the memory should be serialized along with the spark dag.所有通过网络发送或写入磁盘或保存在内存中的数据都应与 spark dag 一起序列化。 Hence, Kryo serialization buffer must be larger than any object you attempt to serialize and must be less than 2048m.因此，Kryo 序列化缓冲区必须大于您尝试序列化的任何对象，并且必须小于 2048m。

https://spark.apache.org/docs/latest/tuning.html#data-serialization https://spark.apache.org/docs/latest/tuning.html#data-serialization