简体   繁体   English

Spark 作业性能问题

[英]Spark job performance issue

I have following DSE cluster configuration:我有以下 DSE 集群配置:

6 nodes with 6 cores/16GB ram for each node.

My app is build using pyspark that read data from Cassandra DB.我的应用程序是使用从 Cassandra DB 读取数据的 pyspark 构建的。

We load on cassandra db 320.000.000 rows and run my python spark application with full memory and cores and have this error:我们在 cassandra db 320.000.000 行上加载并运行我的 python spark 应用程序,内存和核心都已满,但出现此错误:

Lost task 97.0 in stage 299.0 (TID 14680, 11.218.78.15): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeToStream(UnsafeRow.java:562)
at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeValue(UnsafeRowSerializer.scala:69)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:150)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Could you help me?你可以帮帮我吗? I Have about 20GB on every node.我在每个节点上都有大约 20GB。

This Exception is about your disk space on your nodes.此异常与节点上的磁盘空间有关。 Check it and find out how much space is left and then check your code to consider how much you log and its disk usage.检查它并找出剩余的空间,然后检查您的代码以考虑您记录了多少及其磁盘使用情况。 But the first solution is freeing up some space from disk.但第一个解决方案是从磁盘中释放一些空间。 If you checked and find out there is enough space left, then check the space where the executable spark job file is uploaded to by spark master.如果检查发现剩余空间足够,则检查spark master上传的可执行spark job文件所在的空间。 It's much more probable if your previous submitted jobs has not finished gracefully and temp files beside your job file remained on the temporary directory used for each submission of jobs.如果您之前提交的作业没有正常完成并且您的作业文件旁边的临时文件保留在用于每次提交作业的临时目录中,则更有可能发生这种情况。 Then you have two solutions:那么你有两个解决方案:

  • restart your machine/VM which makes temp files to be deleted.重新启动您的机器/虚拟机,这会使临时文件被删除。
  • find those temp files your self and delete the unnecessary ones.自己找到那些临时文件并删除不需要的文件。

this error comes at that time also when we run the spark in local mode (i also faced this same problem as i was running my spark query in local mode), this error may get resolved,if you run the spark in yarn mode.当我们在本地模式下运行 spark 时也会出现此错误(我在本地模式下运行 spark 查询时也遇到了同样的问题),如果您在 yarn 模式下运行 spark,此错误可能会得到解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM