簡體   English   中英

Spark 2.4 - dataframe 寫入 s3 存儲桶

[英]Spark 2.4 - dataframe write into s3 bucket

從我的本地 PC 中,我嘗試將我的 DF 加載到 S3 中。下面是我的代碼片段。

sparkContext.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", Util.AWS_ACCESS_KEY)
sparkContext.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", Util.AWS_SECRET_ACCESS_KEY)
sparkContext.hadoopConfiguration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
  empTableDF.coalesce(1).write
  .format("csv")
  .option("header", "true")
  .mode(SaveMode.Overwrite)      
  .save("s3a://welpocstg/")

運行時我遇到異常

com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

我的 pom.xml

<dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.7</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-aws</artifactId>
            <version>2.7.7</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient -->
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.6</version>
        </dependency>

您可以嘗試以下更改。

sparkContext.hadoopConfiguration.set("fs.s3a.access.key", Util.AWS_ACCESS_KEY)
sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", Util.AWS_SECRET_ACCESS_KEY)

  Seq("1","2","3").toDF("id")
  .coalesce(1)
  .write
  .format("csv")
  .option("header", "true")
  .mode(SaveMode.Overwrite)      
  .save("s3a://welpocstg/")


暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM