简体   繁体   English

Spark在Gzip中编码并发送到S3 - java.io.IOException:设备上没有剩余空间

[英]Spark encode in Gzip and send to S3 - java.io.IOException: No space left on device

I'm trying to GZIP and send an RDD over to S3 like so: 我正在尝试GZIP并将RDD发送到S3,如下所示:

dwPartitioned.saveAsTextFile(s"s3n://$accessKey:$secretKey@bucket", classOf[GzipCodec])

The job starts running and shortly after comes up with: 工作开始运行,并在提出后不久:

org.apache.spark.SparkException: Job aborted due to stage failure:  ... : java.io.IOException: No space left on device

I read that because of the encoding there is some shuffling done which requires temporary files to be generated. 我读到,因为编码有一些改组,需要生成临时文件。 Is that true? 真的吗? Am I misusing the functionality? 我滥用了这个功能吗? Is there something that I can optimize here? 有什么我可以在这里优化的吗?

More importantly - how can I achieve this in memory? 更重要的是 - 我怎样才能在记忆中实现这一目标?

If you need more info I'll gladly append it. 如果您需要更多信息,我很乐意附加它。

By default, spark uses " /tmp " to save intermediate files. 默认情况下,spark使用“ /tmp ”来保存中间文件。 When the job is running, you can tab " df -h " to see the used space of fs mounted at "/" growing up. 当作业运行时,您可以选中“ df -h ”以查看在“/”成长时安装的fs的已用空间。 When the space of the dev is runned out of, this exception is throwed. 当dev的空间耗尽时,抛出此异常。 To solve the problem, set the SPARK_LOCAL_DIRS in the SPARK_HOME/conf/spark_defaults.conf with a path in a fs leaving enough space. 要解决此问题,请在SPARK_HOME/conf/spark_defaults.conf设置SPARK_LOCAL_DIRS,并在fs留下足够的空间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 运行sbt失败 - java.io.IOException:设备上没有剩余空间 - Running sbt fails - java.io.IOException: No space left on device Spark&hbase:java.io.IOException:对等重置连接 - Spark&hbase: java.io.IOException: Connection reset by peer java.io.IOException:方案的无文件系统:hdfs - java.io.IOException: No FileSystem for scheme : hdfs 在EMR群集上运行Spark作业时发生异常“ java.io.IOException:所有数据节点均损坏” - Exceptions while running Spark job on EMR cluster “java.io.IOException: All datanodes are bad” java.io.IOException:无法在 Hadoop 二进制文件中找到可执行文件 null\bin\winutils.exe。 火花 Eclipse 在 windows 7 - java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. spark Eclipse on windows 7 java.io.IOException:对等的memcached重置了连接? - java.io.IOException: Connection reset by peer memcached? 从 ensime 运行 sbt 时出现 java.io.IOException? - java.io.IOException when running sbt from ensime? 原因:java.io.IOException:文件已存在 - Caused by: java.io.IOException: File already exists 线程“主”java.io.IOException 中的异常:作业失败 - Exception in thread "main" java.io.IOException: Job failed Hadoop:java.io.IOException:传递删除或放置 - Hadoop : java.io.IOException: Pass a Delete or a Put
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM