Connecting to Mongo with replica set and mongo-hadoop connector for Spark

Question

I have a Spark process that is currently using the mongo-hadoop bridge (from https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ) to access the mongo database:

mongo_url = 'mongodb://localhost:27017/db_name.collection_name'
mongo_rdd = spark_context.mongoRDD(mongo_url)

The mongo instance is now being upgraded to a cluster that can only be accessed with a replica set.

How do I create an RDD using the mongo-hadoop connector? The mongoRDD() goes to mongoPairRDD(), which may not take multiple strings.

Answer 1

The MongoDB Hadoop Connector mongoRDD can take a valid MongoDB Connection String .

For example, if it's now a replica set you can specify:

mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/?db_name&replicaSet=YourReplicaSetName

See also related information: