简体   繁体   中英

Databricks connect to IntelliJ + python Error Exception in thread “main” java.lang.NoSuchMethodError:

I trying to connect my databricks with my IDE

I do not have spark ad/or scala downloaded on my machine, but I did download pyspark (pip install pyspark). I consturcted the necessary environmental variables and made a folder Hadoop, in which I placed a folder bin, in which I placed a winutils.exe file.

This was a step-wise process in which slowsly but steadily all my errors were solved, except for the last one:

import logging
from pyspark.sql import SparkSession
from pyspark import SparkConf

if __name__ == "__main__":
    spark = SparkSession.builder.getOrCreate()
    spark.sparkContext.setLogLevel("OFF")

Gives

1/03/30 15:14:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Exception in thread "main" java.lang.NoSuchMethodError: py4j.GatewayServer$GatewayServerBuilder.securityManager(Lpy4j/security/Py4JSecurityManager;)Lpy4j/GatewayServer$GatewayServerBuilder;
    at org.apache.spark.api.python.Py4JServer.<init>(Py4JServer.scala:68)
    at org.apache.spark.api.python.PythonGatewayServer$.main(PythonGatewayServer.scala:37)
    at org.apache.spark.api.python.PythonGatewayServer.main(PythonGatewayServer.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

So the first warning is probably due to the fact that I do not have hadoop/spark installed. However, I read that as long as the windows executble winutils.exe is in the bin folder of Hadoop, this should work. (before I had the winutils in that folder, other errors arose, I dealt with those by adding the winutils.exe file) So it is about the Exception in thread 'main' error.

Any idea?

You need to uninstall PySpark as it's described in documentation . Per documentation:

Having both installed will cause errors when initializing the Spark context in Python. This can manifest in several ways, including “stream corrupted” or “ class not found ” errors. If you have PySpark installed in your Python environment, ensure it is uninstalled before installing databricks-connect .

so you need to do:

pip uninstall pyspark
pip uninstall databricks-connect
pip install -U databricks-connect==5.5.*  # or X.Y.* to match your cluster version.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM