简体   繁体   中英

Spark job performance issue

I have following DSE cluster configuration:

6 nodes with 6 cores/16GB ram for each node.

My app is build using pyspark that read data from Cassandra DB.

We load on cassandra db 320.000.000 rows and run my python spark application with full memory and cores and have this error:

Lost task 97.0 in stage 299.0 (TID 14680, 11.218.78.15): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at net.jpountz.lz4.LZ4BlockOutputStream.write(LZ4BlockOutputStream.java:158)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.spark.sql.catalyst.expressions.UnsafeRow.writeToStream(UnsafeRow.java:562)
at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.writeValue(UnsafeRowSerializer.scala:69)
at org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:185)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:150)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Could you help me? I Have about 20GB on every node.

This Exception is about your disk space on your nodes. Check it and find out how much space is left and then check your code to consider how much you log and its disk usage. But the first solution is freeing up some space from disk. If you checked and find out there is enough space left, then check the space where the executable spark job file is uploaded to by spark master. It's much more probable if your previous submitted jobs has not finished gracefully and temp files beside your job file remained on the temporary directory used for each submission of jobs. Then you have two solutions:

  • restart your machine/VM which makes temp files to be deleted.
  • find those temp files your self and delete the unnecessary ones.

this error comes at that time also when we run the spark in local mode (i also faced this same problem as i was running my spark query in local mode), this error may get resolved,if you run the spark in yarn mode.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM