简体   繁体   English

Python Kedro PySpark:py4j.protocol.Py4JJavaError:调用 None.org.apache.spark.api.java.JavaSparkContext 时发生错误

[英]Python Kedro PySpark : py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext

it's my first project using kedro with Pyspark and I have an issue.这是我使用 kedro 和 Pyspark 的第一个项目,我有一个问题。 I work with the new Mac (M1).我使用新的 Mac (M1)。 When I do spark-shell in the terminal, spark is successfully installed and I have the right output (welcome to spark version 3.2.1 with the picture).当我在终端做spark-shell时,spark安装成功,我有正确的output(欢迎使用带有图片的spark 3.2.1版本)。 However, I tried to run spark using Kedro project, I have a trouble.但是,我尝试使用 Kedro 项目运行 spark,但遇到了麻烦。 I tried to find solutions thanks to stack overflow discussion but nothing linked with this.由于堆栈溢出讨论,我试图找到解决方案,但与此无关。

Version:版本:

  • Python: 3.8 Python:3.8
  • Java: openjdk version "18" 2022-03-22 Java:openjdk 版本“18”2022-03-22
  • PySpark: 3.2.1 PySpark:3.2.1

Spark conf:火花会议:

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

And in my project context of Kedro:在我的 Kedro 项目上下文中:

class ProjectContext(KedroContext):
    """A subclass of KedroContext to add Spark initialisation for the pipeline."""

    def __init__(
        self,
        package_name: str,
        project_path: Union[Path, str],
        env: str = None,
        extra_params: Dict[str, Any] = None,
    ):
        super().__init__(package_name, project_path, env, extra_params)
        if not os.getenv('DISABLE_SPARK'):
            self.init_spark_session()

    def init_spark_session(self) -> None:
        """Initialises a SparkSession using the config
        defined in project's conf folder.
        """

        parameters = self.config_loader.get("spark*", "spark*/**")
        spark_conf = SparkConf().setAll(parameters.items())

        # Initialise the spark session
        spark_session_conf = (
            SparkSession.builder.appName(self.package_name)
            .enableHiveSupport()
            .config(conf=spark_conf)
            .master("local[*]")
        )
        _spark_session = spark_session_conf.getOrCreate()

When I run it, I have this error:当我运行它时,出现以下错误:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
    at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
    at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
    at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
    at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
    at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
    at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
    at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
    at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
    at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
    at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
    at java.base/java.lang.Thread.run(Thread.java:833)

In my terminal, I adapted the commands to match my Python path:在我的终端中,我修改了命令以匹配我的 Python 路径:

export HOMEBREW_OPT="/opt/homebrew/opt"
export JAVA_HOME="$HOMEBREW_OPT/openjdk/"
export SPARK_HOME="$HOMEBREW_OPT/apache-spark/libexec"
export PATH="$JAVA_HOME:$SPARK_HOME:$PATH"
export SPARK_LOCAL_IP=localhost

Thanks you for your help谢谢你的帮助

Hi @Mathilde Roblot thanks for the detailed report -嗨@Mathilde Roblot 感谢您的详细报告 -

The specific error ' cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module ' sticks out to me.具体错误“ cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module ”,这对我很明显。

Googling suggests that you may be retrieving the wrong Java (not 8.0 as required by spark)谷歌搜索表明您可能正在检索错误的 Java(不是 spark 要求的 8.0)

this also happens when your spark env libs are not being picked up by Kedro or Kedro is not able to find spark in your env.当您的 spark 环境库未被 Kedro 拾取或 Kedro 无法在您的环境中找到 spark 时,也会发生这种情况。

QQ: are using an IDE like PyCharm, if that is the case, you might need to go to preferences and embed your env variables. QQ:正在使用 IDE,如 PyCharm,如果是这种情况,您可能需要将 go 设置为首选项并嵌入环境变量。 I had faced the same problem and setting the env variables from the project preferences helped me我遇到了同样的问题,从项目首选项中设置 env 变量帮助了我

Hope this helps希望这可以帮助

You can use some SparkConf to set needed --add-opens see: https://stackoverflow.com/a/71855571/13547620 .您可以使用一些SparkConf来设置所需--add-opens请参阅: https://stackoverflow.com/a/71855571/13547620

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Py4JJavaError:调用 None.org.apache.spark.api.java.JavaSparkContext 时出错 - Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext py4j.protocol.Py4JJavaError:调用z:org.apache.spark.api.python.PythonRDD.collectAndServe时发生错误 - py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe 在本地创建 sparkcontext 时出错调用 None.org.apache.spark.api.java.JavaSparkContext 时出错 - Error creating sparkcontext locally An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext Spark py4j.protocol.Py4JJavaError:调用o718.showString时发生错误 - Spark py4j.protocol.Py4JJavaError: An error occurred while calling o718.showString PySpark: py4j.protocol.Py4JJavaError: 调用 o215.save 时出错 - PySpark: py4j.protocol.Py4JJavaError: An error occurred while calling o215.save pyspark 结构化流 kafka - py4j.protocol.Py4JJavaError:调用 o41.save 时出错 - pyspark structured streaming kafka - py4j.protocol.Py4JJavaError: An error occurred while calling o41.save MongoDB Spark 连接器 py4j.protocol.Py4JJavaError:调用 o50.load 时出错 - MongoDB Spark Connector py4j.protocol.Py4JJavaError: An error occurred while calling o50.load Python Page Rank Streaming Application using Hadoop, py4j.protocol.Py4JJavaError: An error occurred while calling o27.partitions - Python Page Rank Streaming Application using Hadoop, py4j.protocol.Py4JJavaError: An error occurred while calling o27.partitions Py4JJavaError:调用 z:org.apache.spark.api.python.PythonRDD.runJob 时发生错误。 ModuleNotFoundError: 没有名为“numpy”的模块 - Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob. ModuleNotFoundError: No module named 'numpy' 我在 Py4JJavaError 上有问题:调用 z:org.apache.spark.api.python.PythonRDD.collectAndServe.over 时发生错误 - I have issue on Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.over
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM