简体   繁体   中英

Connecting to Mongo with replica set and mongo-hadoop connector for Spark

I have a Spark process that is currently using the mongo-hadoop bridge (from https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ) to access the mongo database:

mongo_url = 'mongodb://localhost:27017/db_name.collection_name'
mongo_rdd = spark_context.mongoRDD(mongo_url)

The mongo instance is now being upgraded to a cluster that can only be accessed with a replica set.

How do I create an RDD using the mongo-hadoop connector? The mongoRDD() goes to mongoPairRDD(), which may not take multiple strings.

The MongoDB Hadoop Connector mongoRDD can take a valid MongoDB Connection String .

For example, if it's now a replica set you can specify:

mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/?db_name&replicaSet=YourReplicaSetName

See also related information:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM