繁体   English   中英

使用 Spark 3 在 LocalStack 上写入 S3:RemoteFileChangedException - 在位置打开期间由 S3 报告的更改。 ETag 不可用

[英]Write into S3 on LocalStack with Spark 3: RemoteFileChangedException - Change reported by S3 during open at position. ETag was unavailable

我正在尝试在我的 testcontainers Localstack 中将镶木地板写入 S3 并收到此错误:

org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable

它与真正的 S3 一起使用,并且与 Spark 2.4 和 Hadoop 2.7 一起使用。

我正在使用:Scala 2.12.15、Spark 3.2.1、hadoop-aws 3.3.1、testcontainers-scala-localstack 0.40.8

代码很简单,将dataframe写入s3位置即可:

val path = "s3a://***"
import spark.implicits._

val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)

您可以在创建时禁用存储桶版本控制。 这是一个例子:

      //create an S3 client using localstack container
       S3Client s3Client = S3Client.builder ()
        .endpointOverride (localStackContainer.getEndpointOverride (LocalStackContainer.Service.S3))
        .credentialsProvider (StaticCredentialsProvider.create (AwsBasicCredentials
            .create (localStackContainer.getAccessKey (), localStackContainer.getSecretKey ())))
        .region (Region.of (localStackContainer.getRegion ()))
        .build ();
        

    // create desired bucket
    s3Client.createBucket (builder -> builder.bucket (<your-bucket-name>));


    //disable versioning on your bucket
    s3Client.putBucketVersioning (builder -> builder
        .bucket (<your-bucket-name>)
        .versioningConfiguration (builder1 -> builder1
            .status (BucketVersioningStatus.SUSPENDED)));

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM