[英]Write into S3 on LocalStack with Spark 3: RemoteFileChangedException - Change reported by S3 during open at position. ETag was unavailable
我正在尝试在我的 testcontainers Localstack 中将镶木地板写入 S3 并收到此错误:
org.apache.hadoop.fs.s3a.RemoteFileChangedException: open `s3a://***.snappy.parquet': Change reported by S3 during open at position ***. ETag *** was unavailable
它与真正的 S3 一起使用,并且与 Spark 2.4 和 Hadoop 2.7 一起使用。
我正在使用:Scala 2.12.15、Spark 3.2.1、hadoop-aws 3.3.1、testcontainers-scala-localstack 0.40.8
代码很简单,将dataframe写入s3位置即可:
val path = "s3a://***"
import spark.implicits._
val df = Seq(UserRow("1", List("10", "20"))).toDF()
df.write.parquet(path)
您可以在创建时禁用存储桶版本控制。 这是一个例子:
//create an S3 client using localstack container
S3Client s3Client = S3Client.builder ()
.endpointOverride (localStackContainer.getEndpointOverride (LocalStackContainer.Service.S3))
.credentialsProvider (StaticCredentialsProvider.create (AwsBasicCredentials
.create (localStackContainer.getAccessKey (), localStackContainer.getSecretKey ())))
.region (Region.of (localStackContainer.getRegion ()))
.build ();
// create desired bucket
s3Client.createBucket (builder -> builder.bucket (<your-bucket-name>));
//disable versioning on your bucket
s3Client.putBucketVersioning (builder -> builder
.bucket (<your-bucket-name>)
.versioningConfiguration (builder1 -> builder1
.status (BucketVersioningStatus.SUSPENDED)));
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.