简体   繁体   English

如何修复“java.lang.ClassNotFoundException:org.apache.spark.internal.io.cloud.PathOutputCommitProtocol”Pyspark

[英]How to fix "java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" Pyspark

Below are the runtime versions in pycharm.以下是 pycharm 中的运行时版本。

Java Home   /Library/Java/JavaVirtualMachines/jdk-11.0.16.1.jdk/Contents/Home
Java Version    11.0.16.1 (Oracle Corporation)
Scala Version   version 2.12.15
Spark Version.         spark-3.3.1
Python 3.9

I am trying to write a pyspark dataframe to csv as below:我正在尝试将 pyspark dataframe 写入 csv,如下所示:

df.write.csv("/Users/data/data.csv")

and gets the error:并得到错误:

     Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_exec2.py", line 3, in Exec
    exec(exp, global_vars, local_vars)
  File "<input>", line 1, in <module>
  File "/Users/mambaforge-pypy3/envs/lib/python3.9/site-packages/pyspark/sql/readwriter.py", line 1240, in csv
    self._jwrite.csv(path)
  File "/Users/mambaforge-pypy3/envs/lib/python3.9/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/Users/mambaforge-pypy3/envs/lib/python3.9/site-packages/pyspark/sql/utils.py", line 190, in deco
    return f(*a, **kw)
  File "/Users/mambaforge-pypy3/envs/lib/python3.9/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o747.csv.
: java.lang.ClassNotFoundException: org.apache.spark.internal.io.cloud.PathOutputCommitProtocol

And spark conf is as below:并且 spark conf 如下:

spark_conf = SparkConf()
        spark_conf.setAll(parameters.items())
        spark_conf.set('spark.jars.packages', 'org.apache.hadoop:hadoop-aws:3.3.4')
        spark_conf.set('spark.hadoop.fs.s3.aws.credentials.provider',
                       'org.apache.hadoop.fs.s3.TemporaryAWSCredentialsProvider')
        spark_conf.set('spark.hadoop.fs.s3.access.key', os.environ.get('AWS_ACCESS_KEY_ID'))
        spark_conf.set('spark.hadoop.fs.s3.secret.key', os.environ.get('AWS_SECRET_ACCESS_KEY'))
        spark_conf.set('spark.hadoop.fs.s3a.bucket.all.committer.magic.enabled', 'true')
        spark_conf.set("com.amazonaws.services.s3.enableV4", "true")
        spark_conf.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
        spark_conf.set("fs.s3a.aws.credentials.provider",
                       "com.amazonaws.auth.InstanceProfileCredentialsProvider,com.amazonaws.auth.DefaultAWSCredentialsProviderChain")
        spark_conf.set("fs.AbstractFileSystem.s3a.impl", "org.apache.hadoop.fs.s3a.S3A")
        spark_conf.set("hadoop.fs.s3a.path.style.access", "true")
        spark_conf.set("hadoop.fs.s3a.fast.upload", "true")
        spark_conf.set("hadoop.fs.s3a.fast.upload.buffer", "bytebuffer")
        spark_conf.set("fs.s3a.path.style.access", "true")
        spark_conf.set("fs.s3a.multipart.size", "128M")
        spark_conf.set("fs.s3a.fast.upload.active.blocks", "4")
        spark_conf.set("fs.s3a.committer.name", "partitioned")
        spark_conf.set("spark.hadoop.fs.s3a.committer.name", "directory")
        spark_conf.set("spark.sql.sources.commitProtocolClass",
                       "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol")
        spark_conf.set("spark.sql.parquet.output.committer.class",
                       "org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter")
        spark_conf.set("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "1")

Any help to fix this issue is appreciated.解决此问题的任何帮助表示赞赏。 Thanks!!谢谢!!

Looks like you do not have the hadoop-cloud module added.看起来您没有添加hadoop-cloud模块。 The class is not part of core Spark. class 不是核心 Spark 的一部分。 https://search.maven.org/artifact/org.apache.spark/spark-hadoop-cloud_2.12/3.3.1/jar https://search.maven.org/artifact/org.apache.spark/spark-hadoop-cloud_2.12/3.3.1/jar

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何解决 java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.greo.GryoSerializer - How to solve java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer “蜂巢上的火花”-原因:java.lang.ClassNotFoundException:org.apache.hive.spark.counter.SparkCounters - 'spark on hive' - Caused by: java.lang.ClassNotFoundException: org.apache.hive.spark.counter.SparkCounters java.lang.ClassNotFoundException:org.apache.spark.sql.sources.v2.DataSourceV2 用于 Spark 3.0.0 - java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.DataSourceV2 for Spark 3.0.0 java.lang.ClassNotFoundException: com.sun.org.apache.xml.internal.resolver.CatalogManager Java 11 - java.lang.ClassNotFoundException: com.sun.org.apache.xml.internal.resolver.CatalogManager Java 11 java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream 关于使用 HSSFWorkbook - java.lang.ClassNotFoundException: org.apache.commons.io.output.UnsynchronizedByteArrayOutputStream on using HSSFWorkbook org.apache.commons.lang.SerializationException:java.lang.ClassNotFoundException - org.apache.commons.lang.SerializationException: java.lang.ClassNotFoundException 运行 maven 项目时出错 java.lang.ClassNotFoundException: org.apache.commons.io.IOUtils - error while running maven project java.lang.ClassNotFoundException: org.apache.commons.io.IOUtils 引起原因:java.lang.ClassNotFoundException:org.apache.commons.io.FileUtils,maven pom.xml wa - Caused by: java.lang.ClassNotFoundException: org.apache.commons.io.FileUtils, maven pom.xml wa 如何修复java.lang.ClassNotFoundException错误? - How to fix java.lang.ClassNotFoundException error? java.lang.ClassNotFoundException: org.springframework.core.io.Resource - java.lang.ClassNotFoundException: org.springframework.core.io.Resource
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM