簡體 English 中英

Spark：如何覆蓋 S3 文件夾上的文件而不是完整的文件夾

[英]Spark: How to overwrite a file on S3 folder and not complete folder

原文 2019-07-09 06:38:31 9 2 apache-spark/ amazon-s3/ apache-spark-2.0

使用 Spark 我正在嘗試將一些數據（以 csv、parquet 格式）推送到 S3 存儲桶。

df.write.mode("OVERWRITE").format("com.databricks.spark.csv").options(nullValue=options['nullValue'], header=options['header'], delimiter=options['delimiter'], quote=options['quote'], escape=options['escape']).save(destination_path)

2 個解決方案

簡短回答：將 Spark 配置參數spark.sql.sources.partitionOverwriteMode為dynamic而不是靜態。 這只會覆蓋必要的分區，而不是全部。 PySpark 示例：

conf=SparkConf().setAppName("test).set("spark.sql.sources.partitionOverwriteMode","dynamic").setMaster("yarn")
sc = SparkContext(conf=conf)
sqlContext = sql.SQLContext(sc)

可以先刪除文件，然后使用追加模式插入數據，而不是覆蓋以保留子文件夾。 以下是 Pyspark 的一個示例。

import subprocess
subprocess.call(["hadoop", "fs", "-rm", "{}*.csv.deflate".format(destination_path)])

df.write.mode("append").format("com.databricks.spark.csv").options(nullValue=options['nullValue'], header=options['header'], delimiter=options['delimiter'], quote=options['quote'], escape=options['escape']).save(destination_path)

Spark saveAsTextFile寫入空文件- <directory> _ $ folder $到S3

[英]Spark saveAsTextFile writes empty file - <directory>_$folder$ to S3

無法使用 spark scala 在 intelij 本地讀取 AWS S3 文件夾中存在的 CSV 文件

[英]unable to read a CSV file present in AWS S3 folder locally in intelij using spark scala

如果Spark不存在，Spark會創建一個s3文件夾路徑嗎？

[英]Will Spark create a s3 folder path if it doesn't exist?

始終從 spark 中的 s3 存儲桶讀取最新文件夾

[英]Always read latest folder from s3 bucket in spark

從AWS s3中的文件夾發送Spark流-PySpark

[英]Spark Streaming from a folder in AWS s3 - PySpark

將Spark數據幀作為鑲木地板寫入S3而不創建_temporary文件夾

[英]Writing Spark dataframe as parquet to S3 without creating a _temporary folder

如何通過EMR上的火花有效地讀取/解析s3文件夾中的.gz文件的負載

[英]How to efficiently read/parse loads of .gz files in a s3 folder with spark on EMR

使用Spark覆蓋S3文件

[英]Overwrite S3 files using Spark

如何使用 Spark 讀取文件夾文件？

[英]How to read folder file using Spark?

Spark S3完成分段上傳錯誤

[英]Spark S3 complete multipart upload error

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Spark saveAsTextFile寫入空文件- <directory> _ $ folder $到S3 無法使用 spark scala 在 intelij 本地讀取 AWS S3 文件夾中存在的 CSV 文件如果Spark不存在，Spark會創建一個s3文件夾路徑嗎？始終從 spark 中的 s3 存儲桶讀取最新文件夾從AWS s3中的文件夾發送Spark流-PySpark 將Spark數據幀作為鑲木地板寫入S3而不創建_temporary文件夾如何通過EMR上的火花有效地讀取/解析s3文件夾中的.gz文件的負載使用Spark覆蓋S3文件如何使用 Spark 讀取文件夾文件？ Spark S3完成分段上傳錯誤

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM