简体   繁体   English

Databricks 连接 java.lang.ClassNotFoundException

[英]Databricks Connect java.lang.ClassNotFoundException

I updated our databricks cluster to DBR 9.1 LTS on Azure Databricks, but a package I use regularly is giving me an error when I try to run it in VS Code with Databricks-connect, where it didn't with the previous cluster.我在 Azure Databricks 上将我们的 databricks 集群更新为 DBR 9.1 LTS,但是当我尝试使用 Databricks-connect 在 VS Code 中运行它时,我经常使用的一个包给了我一个错误,而以前的集群没有。 The previous cluster was running on DBR 8.3.以前的集群在 DBR 8.3 上运行。 I updated the package too to be compatible with the new DBR cluster.我也更新了软件包以与新的 DBR 集群兼容。 The maven coordinates are Maven坐标是
com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.3.0. com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.3.0。 When I run the following script directly in a Databricks notebook it works, but when I run it with Databricks-connect I get the error below.当我直接在 Databricks 笔记本中运行以下脚本时,它可以工作,但是当我使用 Databricks-connect 运行它时,会出现以下错误。

# com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.3.0
from pyspark.sql.types import StringType, StructField, StructType
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.dbutils import DBUtils

spark = SparkSession.builder.appName("local").getOrCreate()
dbutils = DBUtils(spark)

cosmosEndpoint = ######################
cosmosMasterKey = ######################
cosmosDatabaseName = ######################
cosmosContainerName = "test"

cfg = {
    "spark.cosmos.accountEndpoint": cosmosEndpoint,
    "spark.cosmos.accountKey": cosmosMasterKey,
    "spark.cosmos.database": cosmosDatabaseName,
    "spark.cosmos.container": cosmosContainerName,
}
# Configure Catalog Api to be used
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", cosmosEndpoint)
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", cosmosMasterKey)

df = (
    spark
    .read
    .format("cosmos.oltp")
    .options(**cfg)
    .load()
)
df.show()

The error I get in VS code using Databricks connect is the following:我在使用 Databricks 连接的 VS 代码中得到的错误如下:

Exception has occurred: Py4JJavaError
An error occurred while calling o35.load.
: java.lang.ClassNotFoundException: Failed to find data source: cosmos.oltp. Please find packages at http://spark.apache.org/third-party-projects.html
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:765)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:819)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
    at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:101)
    at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:80)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
    at com.databricks.service.SparkServiceRPCHandler$.getOrLoadAnonymousRelation(SparkServiceRPCHandler.scala:80)
    at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:715)
    at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:478)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:370)
    at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:321)
    at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:307)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:357)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:334)
    at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:153)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
    at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.eclipse.jetty.server.Server.handle(Server.java:516)
    at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
    at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: cosmos.oltp.DefaultSource
    at java.lang.ClassLoader.findClass(ClassLoader.java:524)
    at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
    at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:739)
    at scala.util.Try$.apply(Try.scala:213)
    at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:739)
    at scala.util.Failure.orElse(Try.scala:224)
    at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:739)
    ... 46 more

I have added the jar file of the package to the following directory: .venv\\lib\\site-packages\\pyspark\\jars我已将包的jar文件添加到以下目录: .venv\\lib\\site-packages\\pyspark\\jars

问题似乎已经解决了,因为它似乎又可以工作了。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从数据块连接到数据库时获取 java.lang.ClassNotFoundException: com.mysql.jdbc.Driver - Getting java.lang.ClassNotFoundException: com.mysql.jdbc.Driver when connecting to db from databricks 气流SparkSubmitOperator因java.lang.ClassNotFoundException而失败:类org.apache.spark.examples.SparkPi - Airflow SparkSubmitOperator failing because of java.lang.ClassNotFoundException: class org.apache.spark.examples.SparkPi java.lang.ClassNotFoundException: com.mysql.Z84BEFFD3A0D49636A58ZZDriver在Amazon ECAterAbook877中 - java.lang.ClassNotFoundException: com.mysql.jdbc.Driver in Jupyter Notebook on Amazon EMR Py4JJavaError:调用 o41.load 时出错。 : java.lang.ClassNotFoundException: - Py4JJavaError: An error occurred while calling o41.load. : java.lang.ClassNotFoundException: java.lang.ClassNotFoundException:运行python代码时出现javax.xml.bind.JAXBException - java.lang.ClassNotFoundException: javax.xml.bind.JAXBException while running python code Jupyter Spark数据库访问; java.lang.ClassNotFoundException:com.mysql.jdbc.Driver - Jupyter Spark database access; java.lang.ClassNotFoundException: com.mysql.jdbc.Driver spark 从 oracle 导入数据 - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver - spark importing data from oracle - java.lang.ClassNotFoundException: oracle.jdbc.driver.OracleDriver java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler spark in Pycharm with conda env - java.lang.ClassNotFoundException: com.johnsnowlabs.nlp.DocumentAssembler spark in Pycharm with conda env java.lang.ClassNotFoundException: solr.LatLonType 使用来自 django build_solr_schema 的模式启动 solr - java.lang.ClassNotFoundException: solr.LatLonType when starting solr with a schema from django build_solr_schema 错误:无法找到或加载主 class prism.PrismCL 原因:java.lang.ClassNotFoundException:prism.PrismCL - Error: Could not find or load main class prism.PrismCL Caused by: java.lang.ClassNotFoundException: prism.PrismCL
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM