简体   繁体   English

如何使用 write() 方法使用 spark 在 s3 存储桶中写入 txt 文件

[英]How to write txt file in s3 bucket with spark using write() method

I'm trying to write Dataset in txt format in s3 bucket using spark.我正在尝试使用 spark 在 s3 存储桶中以 txt 格式编写数据集。

but I am getting the following error:但我收到以下错误:

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.intellij.rt.execution.CommandLineWrapper.main(CommandLineWrapper.java:64)
Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

enter image description here在此处输入图像描述

My code:我的代码:

override fun write(input: Dataset<String>) =

        input.coalesce(NUMBER_PARTITIONS).write().text(S3_BUCKET_PATH)
            .also {
                LOGGER.logInfo(
                    LOG_MESSAGE_TEMPLATE,
                    READ_DATA_METHOD,
                    WRITE_MESSAGE
                )
            }

enter image description here在此处输入图像描述

My spark configuration: *我的火花配置:*

object SparkConfiguration {
    private const val SPARK_MASTER_NAME = "spark.master"
    private const val SPARK_APP_NAME_CONFIG = "spark.app.name"
    fun buildSparkSession(config: Config): SparkSession {
        return SparkSession.builder()
            .config(buildSparkConfig(config))
            .orCreate
    }
    fun buildSparkConfig(config: Config): SparkConf = SparkConf()
        .setMaster(config.getString(SPARK_MASTER_NAME))
        .setAppName(config.getString(SPARK_APP_NAME_CONFIG))
}

enter image description here在此处输入图像描述

This is due to having no permissions when running your Spark Job:这是因为在运行 Spark 作业时没有权限:

Caused by: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified by setting the fs.s3.awsAccessKeyId and fs.s3.awsSecretAccessKey properties (respectively).

Make sure that you have permissions to run your code against S3.确保您有权针对 S3 运行代码。

Ideally set your credentials in the conf/core-site.xml as:理想情况下,在 conf/core-site.xml 中将您的凭据设置为:

<configuration>
  <property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>XXXXXX</value>
  </property>

  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>XXXXXX</value>
  </property>
</configuration>

or reinstall awscli on your machine and.或在您的机器上重新安装 awscli 并。

pip install awscli

then然后

aws configure

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM