[英]Python Kedro PySpark : py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext
it's my first project using kedro with Pyspark and I have an issue.这是我使用 kedro 和 Pyspark 的第一个项目,我有一个问题。 I work with the new Mac (M1).
我使用新的 Mac (M1)。 When I do
spark-shell
in the terminal, spark is successfully installed and I have the right output (welcome to spark version 3.2.1 with the picture).当我在终端做
spark-shell
时,spark安装成功,我有正确的output(欢迎使用带有图片的spark 3.2.1版本)。 However, I tried to run spark using Kedro project, I have a trouble.但是,我尝试使用 Kedro 项目运行 spark,但遇到了麻烦。 I tried to find solutions thanks to stack overflow discussion but nothing linked with this.
由于堆栈溢出讨论,我试图找到解决方案,但与此无关。
Version:版本:
Spark conf:火花会议:
spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true
And in my project context of Kedro:在我的 Kedro 项目上下文中:
class ProjectContext(KedroContext):
"""A subclass of KedroContext to add Spark initialisation for the pipeline."""
def __init__(
self,
package_name: str,
project_path: Union[Path, str],
env: str = None,
extra_params: Dict[str, Any] = None,
):
super().__init__(package_name, project_path, env, extra_params)
if not os.getenv('DISABLE_SPARK'):
self.init_spark_session()
def init_spark_session(self) -> None:
"""Initialises a SparkSession using the config
defined in project's conf folder.
"""
parameters = self.config_loader.get("spark*", "spark*/**")
spark_conf = SparkConf().setAll(parameters.items())
# Initialise the spark session
spark_session_conf = (
SparkSession.builder.appName(self.package_name)
.enableHiveSupport()
.config(conf=spark_conf)
.master("local[*]")
)
_spark_session = spark_session_conf.getOrCreate()
When I run it, I have this error:当我运行它时,出现以下错误:
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
In my terminal, I adapted the commands to match my Python path:在我的终端中,我修改了命令以匹配我的 Python 路径:
export HOMEBREW_OPT="/opt/homebrew/opt"
export JAVA_HOME="$HOMEBREW_OPT/openjdk/"
export SPARK_HOME="$HOMEBREW_OPT/apache-spark/libexec"
export PATH="$JAVA_HOME:$SPARK_HOME:$PATH"
export SPARK_LOCAL_IP=localhost
Thanks you for your help谢谢你的帮助
Hi @Mathilde Roblot thanks for the detailed report -嗨@Mathilde Roblot 感谢您的详细报告 -
The specific error ' cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module
' sticks out to me.具体错误“
cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module
”,这对我很明显。
Googling suggests that you may be retrieving the wrong Java (not 8.0 as required by spark)谷歌搜索表明您可能正在检索错误的 Java(不是 spark 要求的 8.0)
this also happens when your spark env libs are not being picked up by Kedro or Kedro is not able to find spark in your env.当您的 spark 环境库未被 Kedro 拾取或 Kedro 无法在您的环境中找到 spark 时,也会发生这种情况。
QQ: are using an IDE like PyCharm, if that is the case, you might need to go to preferences and embed your env variables. QQ:正在使用 IDE,如 PyCharm,如果是这种情况,您可能需要将 go 设置为首选项并嵌入环境变量。 I had faced the same problem and setting the env variables from the project preferences helped me
我遇到了同样的问题,从项目首选项中设置 env 变量帮助了我
Hope this helps希望这可以帮助
You can use some SparkConf
to set needed --add-opens
see: https://stackoverflow.com/a/71855571/13547620 .您可以使用一些
SparkConf
来设置所需--add-opens
请参阅: https://stackoverflow.com/a/71855571/13547620 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.