使用副本集和用于Spark的mongo-hadoop连接器连接到Mongo

Question

I have a Spark process that is currently using the mongo-hadoop bridge (from https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ) to access the mongo database: 我有一个Spark进程，当前正在使用mongo-hadoop桥（来自https://github.com/mongodb/mongo-hadoop/blob/master/spark/src/main/python/README.rst ）访问mongo数据库：

mongo_url = 'mongodb://localhost:27017/db_name.collection_name'
mongo_rdd = spark_context.mongoRDD(mongo_url)

The mongo instance is now being upgraded to a cluster that can only be accessed with a replica set. mongo实例现在正在升级到只能使用副本集访问的群集。

How do I create an RDD using the mongo-hadoop connector? 如何使用mongo-hadoop连接器创建RDD？ The mongoRDD() goes to mongoPairRDD(), which may not take multiple strings. mongoRDD（）进入mongoPairRDD（），它可能不包含多个字符串。

Answer 1

The MongoDB Hadoop Connector mongoRDD can take a valid MongoDB Connection String . MongoDB Hadoop连接器mongoRDD可以采用有效的MongoDB连接字符串。

For example, if it's now a replica set you can specify: 例如，如果现在是副本集，则可以指定：

mongodb://db1.example.net,db2.example.net:27002,db3.example.net:27003/?db_name&replicaSet=YourReplicaSetName

See also related information: 另请参阅相关信息：

使用副本集和用于Spark的mongo-hadoop连接器连接到Mongo

问题描述

1 个解决方案

解决方案1
0 2016-09-06 08:04:00

使用副本集和用于Spark的mongo-hadoop连接器连接到Mongo

问题描述

1 个解决方案

解决方案1 0 2016-09-06 08:04:00

解决方案1
0 2016-09-06 08:04:00