[英]Pyspark S3 NoClassDefFoundError: com/amazonaws/AmazonClientException
I am trying to read S3 files from a small spark cluster I have running.我正在尝试从正在运行的小型 spark 集群中读取 S3 文件。 I have the following jars installed:
我安装了以下 jars:
"aws-java-sdk-bundle-1.11.975.jar"
"hadoop-aws-3.2.1.jar"
And am using the following code:我正在使用以下代码:
from pyspark.context import SparkContext
from pyspark.sql import SparkSession, SQLContext
import os
# initialise Spark session
spark = SparkSession \
.builder \
.appName("Python Spark SQL basic example") \
.config("spark.jars", "aws-java-sdk-bundle-1.11.975.jar") \
.config("spark.jars", "hadoop-aws-3.2.1.jar") \
.getOrCreate()
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-bundle-1.11.975,org.apache.hadoop:hadoop-aws-3.2.1 pyspark-shell'
fp = "s3a://filepath/objects/"
sc = spark.sparkContext
sqlContext = SQLContext(sc)
df = sqlContext.read.parquet(fp)
However when I run this, I get the error An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException
但是,当我运行它时,我收到错误
An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException
An error occurred while calling o62.parquet. : java.lang.NoClassDefFoundError: com/amazonaws/AmazonClientException
If I update fp
to be s3://...
I get the error No FileSystem for scheme "s3"
如果我将
fp
更新为s3://...
我收到错误No FileSystem for scheme "s3"
I have tried a few solutions on here, but nothing seems to work so far.我在这里尝试了一些解决方案,但到目前为止似乎没有任何效果。
Hadoop 3.2.1 was built with AWS SDK 1.11.375; Hadoop 3.2.1 是使用 AWS SDK 1.11.375 构建的; could be a version issue there or simply the aws SDK JAR didn't get on the classpath
可能是那里的版本问题,或者只是 aws SDK JAR 没有进入类路径
I'd start with the "troubleshooting s3a" page in the Hadoop docs我将从 Hadoop 文档中的“s3a 故障排除”页面开始
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.