简体   繁体   English

从 pyspark 访问 s3 时,亚马逊存储桶的证书不匹配

[英]Certificate for amazon bucket doesn't match while accessing s3 from pyspark

I have EC2 instance where I'm trying to configure PySpark to read from S3.我有 EC2 实例,我正在尝试将 PySpark 配置为从 S3 读取。 I set a full access IAM role to EC2 instance and used the following packages in spark:我为 EC2 实例设置了完全访问 IAM 角色,并在 spark 中使用了以下包:

com.amazonaws:aws-java-sdk-bundle:1.11.563,org.apache.hadoop:hadoop-aws:3.3.1 com.amazonaws:aws-java-sdk-bundle:1.11.563,org.apache.hadoop:hadoop-aws:3.3.1

However, I'm getting some new error, and I'm not sure what does it mean:但是,我收到了一些新错误,我不确定这是什么意思:

: org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket_name.stuff/mycsv.csv: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket_name.stuff.s3.amazonaws.com> doesn't match any of the subject alternative names: [*.s3.amazonaws.com, s3.amazonaws.com] : org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket_name.stuff/mycsv.csv: com.amazonaws.SdkClientException: Unable to execute HTTP request: Certificate for <bucket_name.stuff.s3.amazonaws.com>不匹配任何主题备用名称:[*.s3.amazonaws.com, s3.amazonaws.com]

So the issue turned out to be a version mismatch between pyspark, hadoop-aws and java-sdk (I was getting all kind of different errors until I found a proper version setup) The combination that worked for me was:所以问题原来是 pyspark、hadoop-aws 和 java-sdk 之间的版本不匹配(在我找到正确的版本设置之前,我遇到了各种不同的错误)对我有用的组合是:

pyspark 3.0.0
org.apache.hadoop:hadoop-aws:2.7.2
com.amazonaws:aws-java-sdk-pom:1.11.34

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM