简体   繁体   中英

Databricks Notebook Scala Spark Connect to MongoDB Could not initialize class com.mongodb.spark.config.ReadConfig$

I'm using a Databricks Scala notebook with Spark to connect to MongoDB and I'm trying to understand why I'm getting this error when I try to connect to my MongoDB cluster. I simply want to able to read my from database but I'm not sure why this error keeps coming up.

java.lang.NoClassDefFoundError: Could not initialize class com.mongodb.spark.config.ReadConfig$

My code where I'm attempting to read from MongoDB is shown here.

import org.apache.log4j.{Level, Logger}
import org.apache.spark.ml.evaluation.RegressionEvaluator
import org.apache.spark.ml.recommendation.ALS
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit}
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

import com.mongodb.spark.MongoSpark
import com.mongodb.spark.config.{ReadConfig, WriteConfig}
import com.mongodb.spark._
import com.mongodb.spark.config._

val data = spark.read.format("com.mongodb.spark.sql.DefaultSource").option("database", "sample_airbnb").option("collection", "listingsAndReviews").load()
data.show()

I've also installed the following libraries in my notebook library

org.mongodb.spark:mongo-spark-connector_2.12:2.4.0
mongodb_driver_3_12_3_javadoc.jar
mongodb_driver_3_12_3_javadoc.jar
bson_3_12_3_javadoc.jar

These are the uri used for the spark config

spark.mongodb.input.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
spark.mongodb.output.uri mongodb+srv://<user>:<password>@cluster0-ofrzm.azure.mongodb.net/test?retryWrites=true&w=majority
spark.databricks.delta.preview.enabled true

Any help is greatly appreciated!

I have same connection problem on dataproc using pyspark

my solution:

Install these jars

https://mvnrepository.com/artifact/org.mongodb.spark/mongo-spark-connector_2.11/2.4.0
https://repo1.maven.org/maven2/org/mongodb/bson/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver/
https://repo1.maven.org/maven2/org/mongodb/mongodb-driver-core/

Pyspark:

from pyspark.sql import SparkSession

spark = SparkSession.builder\
                    .master('local')\
                    .config('spark.mongodb.input.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
                    .config('spark.mongodb.output.uri', 'mongodb://{ Host }:{ Port }/{ DB }.{ Collection }')\
                    .config('spark.jars.packages', 'org.mongodb.spark:mongo-spark-connector_2.11-2.4.0')\
                    .getOrCreate()

df = spark.read\
          .format("com.mongodb.spark.sql.DefaultSource")\
          .option("database",{ DB })\
          .option("collection", { Collection })\
          .load()

There could be different issues related to this:

  • You're using connector compiled with Scala 2.12 on Databricks runtime that uses Scala 2.11 - this is most probable issue, as DBR 7.0 that uses Scala 2.12 was released almost 2 months later. The rule of thumb - for DBR < 7.0, use artifact 2.4.x with _2.11 in name, for DBR >= 7.0, use _2.12 and version 3.0.0 of that library
  • You don't have all dependencies downloaded. Connector depends on many other libraries that need to be available. It's better to specify library as Maven coordinates: org.mongodb.spark:mongo-spark-connector_2.11-2.4.0 - this will pull all necessary dependencies

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM