简体   繁体   English

如何使用 pyspark 从 databricks 集群连接到 mongodb Atlas

[英]how to connect to mongodb Atlas from databricks cluster using pyspark

how to connect to mongodb Atlas from databricks cluster using pyspark如何使用 pyspark 从 databricks 集群连接到 mongodb Atlas

This is my simple code in notebook这是我在笔记本中的简单代码

from pyspark.sql import SparkSession
spark = SparkSession \
    .builder \
    .appName("myApp") \
    .config("spark.mongodb.input.uri", "mongodb+srv://admin:<password>@mongocluster.fxilr.mongodb.net/TestDatabase.Events") \
    .getOrCreate()

df = spark.read.format("mongo").load()
df.printSchema()

But I am getting error as但我收到错误

IllegalArgumentException: Missing database name. IllegalArgumentException:缺少数据库名称。 Set via the 'spark.mongodb.input.uri' or 'spark.mongodb.input.database' property通过“spark.mongodb.input.uri”或“spark.mongodb.input.database”属性设置

What is wrong am i doing我在做什么错

I followed this steps and I was able to connect.我按照这些步骤进行操作,并且能够连接。

  • Install org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 maven library to your cluster as I was using scala2.12安装 org.mongodb.spark:mongo-spark-connector_2.12:3.0.2 maven 库到您的集群,因为我使用的是 scala2.12

  • Goto Cluster detail page and in Advance option under Spark tab, you add below two config parameters转到集群详细信息页面并在 Spark 选项卡下的高级选项中,添加以下两个配置参数

     spark.mongodb.output.uri connection-string spark.mongodb.input.uri connection-string

Note connection-string should look like this - (have your appropriate user, password and database names)注意connection-string应该是这样的 - (有你适当的用户、密码和数据库名称)

mongodb+srv://user:password@cluster1.s5tuva0.mongodb.net/my_database?retryWrites=true&w=majority

  • Use following python code in your notebook and it should load your sample collection to a dataframe在您的笔记本中使用以下 python 代码,它应该将您的样本集合加载到 dataframe
# Reading from MongoDB
df = spark.read\
.format("com.mongodb.spark.sql.DefaultSource")\
.option("uri", "mongodb+srv://user:password@cluster1.s5tuva0.mongodb.net/database?retryWrites=true&w=majority")\
.option("database", "my_database")\
.option("collection", "my_collection")\
.load()
  • You can use following to write to mongo db您可以使用以下内容写入 mongo db
events_df.write\
    .format('com.mongodb.spark.sql.DefaultSource')\
    .mode("append")\
    .option( "uri", "mongodb+srv://user:password@cluster1.s5tuva0.mongodb.net/my_database.my_collection?retryWrites=true&w=majority") \
    .save()

Hope this should work for you.希望这对你有用。 Please do let others know if it worked.请让其他人知道它是否有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM