简体   繁体   中英

Spark Error Writing DataFrame to LocalStack S3

I'm running LocalStack and attempting to write a DataFrame to S3 using the code below.

val spark = SparkSession
    .builder()
    .appName("LocalStack Test")
    .master("local[*]")
    .config("spark.hadoop.fs.s3a.endpoint", "http://0.0.0.0:4572")
    .config("fs.s3a.path.style.access", "true")
    .getOrCreate()

val df = spark.sqlContext.read
    .option("header", "true")
    .option("inferSchema", "true")
    .csv("test.csv")

df.write
    .mode(SaveMode.Overwrite)
    .save(s"s3a://test/test2.csv")

This throws the following exception:

Caused by: com.amazonaws.SdkClientException: Unable to verify integrity of data upload.  Client calculated content hash (contentMD5: 1B2M2Y8AsgTpgAmY7PhCfg== in base 64) didn't match hash (etag: c20aef10d728c21878be739244ab1080 in hex) calculated by Amazon S3.  You may need to delete the data stored in Amazon S3. (metadata.contentMD5: null, md5DigestStream: com.amazonaws.services.s3.internal.MD5DigestCalculatingInputStream@2a6b3577, bucketName: spark, key: test/_temporary/0/)

Seems like this was a known issue that was recently resolved, but Spark still has trouble. Are there any additional configuration options I need to set when creating my SparkSession?

"spark.hadoop.fs.s3a.endpoint" is set to something odd...is this a local S3 server?

if so: trying forcing s3a down to v2 signing xml <property> <name>fs.s3a.signing-algorithm</name> <value>AWS3SignerType</value> </property> I will not make any promises that it will work, only that it has been known to make the problem go away "once"

ps: CSV inferSchema is really expensive against S3, as the file will get read completely just to work out the schema, then a second time for the compute. Do it once, print the results, then use that schema from then on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM