Pyspark S3 NoClassDefFoundError: com/amazonaws/AmazonClientException

Question

I am trying to read S3 files from a small spark cluster I have running.我正在尝试从正在运行的小型 spark 集群中读取 S3 文件。 I have the following jars installed:我安装了以下 jars：

"aws-java-sdk-bundle-1.11.975.jar"
"hadoop-aws-3.2.1.jar"

And am using the following code:我正在使用以下代码：

from pyspark.context import SparkContext
from pyspark.sql import SparkSession, SQLContext
import os

# initialise Spark session
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL basic example") \
    .config("spark.jars", "aws-java-sdk-bundle-1.11.975.jar") \
    .config("spark.jars", "hadoop-aws-3.2.1.jar") \
    .getOrCreate()

os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-bundle-1.11.975,org.apache.hadoop:hadoop-aws-3.2.1 pyspark-shell'

fp = "s3a://filepath/objects/"
sc = spark.sparkContext
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(fp)

However when I run this, I get the error An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException但是，当我运行它时，我收到错误An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException

If I update fp to be s3://... I get the error No FileSystem for scheme "s3"如果我将fp更新为s3://...我收到错误No FileSystem for scheme "s3"

I have tried a few solutions on here, but nothing seems to work so far.我在这里尝试了一些解决方案，但到目前为止似乎没有任何效果。

Answer 1

Hadoop 3.2.1 was built with AWS SDK 1.11.375; Hadoop 3.2.1 是使用 AWS SDK 1.11.375 构建的； could be a version issue there or simply the aws SDK JAR didn't get on the classpath可能是那里的版本问题，或者只是 aws SDK JAR 没有进入类路径

I'd start with the "troubleshooting s3a" page in the Hadoop docs我将从 Hadoop 文档中的“s3a 故障排除”页面开始

Pyspark S3 NoClassDefFoundError: com/amazonaws/AmazonClientException

问题描述

1 个解决方案

解决方案1
0 2021-03-24 13:24:33

Pyspark S3 NoClassDefFoundError: com/amazonaws/AmazonClientException

问题描述

1 个解决方案

解决方案1 0 2021-03-24 13:24:33

解决方案1
0 2021-03-24 13:24:33