[英]How to write txt file in s3 bucket with spark using write() method
I'm trying to write Dataset in txt format in s3 bucket using spark.我正在尝试使用 spark 在 s3 存储桶中以 txt 格式编写数据集。
but I am getting the following error:但我收到以下错误:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:64)
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).
enter image description here在此处输入图像描述
My code:我的代码:
override fun write(input: Dataset<String>) =
input.coalesce(NUMBER_PARTITIONS).write().text(S3_BUCKET_PATH)
.also {
LOGGER.logInfo(
LOG_MESSAGE_TEMPLATE,
READ_DATA_METHOD,
WRITE_MESSAGE
)
}
enter image description here在此处输入图像描述
My spark configuration: *我的火花配置:*
object SparkConfiguration {
private const val SPARK_MASTER_NAME = "spark.master"
private const val SPARK_APP_NAME_CONFIG = "spark.app.name"
fun buildSparkSession(config: Config): SparkSession {
return SparkSession.builder()
.config(buildSparkConfig(config))
.orCreate
}
fun buildSparkConfig(config: Config): SparkConf = SparkConf()
.setMaster(config.getString(SPARK_MASTER_NAME))
.setAppName(config.getString(SPARK_APP_NAME_CONFIG))
}
This is due to having no permissions when running your Spark Job:这是因为在运行 Spark 作业时没有权限:
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).
Make sure that you have permissions to run your code against S3.确保您有权针对 S3 运行代码。
Ideally set your credentials in the conf/core-site.xml as:理想情况下,在 conf/core-site.xml 中将您的凭据设置为:
<configuration>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>XXXXXX</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>XXXXXX</value>
</property>
</configuration>
or reinstall awscli on your machine and.或在您的机器上重新安装 awscli 并。
pip install awscli
then然后
aws configure
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.