[英]How to save data from org.apache.spark.sql.DataFrame created on MongoDB data back to MongoDB?
有一些方法可以將org.apache.spark.sql.DataFrame
數據保存到文件系統或Hive。 但是如何將MongoDB數據上創建的DataFrame
數據保存回MongoDB?
編輯 :我使用創建了DataFrame
SparkContext sc = new SparkContext()
Configuration config = new Configuration();
config.set("mongo.input.uri","mongodb://localhost:27017:testDB.testCollection);
JavaRDD<Tuple2<Object, BSONObject>> mongoJavaRDD = sc.newAPIHadoopRDD(config, MongoInputFormat.class, Object.class,
BSONObject.class).toJavaRDD();
JavaRDD<Object> mongoRDD = mongoJavaRDD.flatMap(new FlatMapFunction<Tuple2<Object, BSONObject>, Object>()
{
@Override
public Iterable<Object> call(Tuple2<Object, BSONObject> arg)
{
BSONObject obj = arg._2();
Object javaObject = generateJavaObjectFromBSON(obj, clazz);
return Arrays.asList(javaObject);
}
});
sqlContext = new SqlContext(sc);
DataFrame df = sqlContext.createDataFrame(mongoRDD, Person.class).registerTempTable("Person");
使用PySpark並假設您有一個本地MongoDB實例:
import pymongo
from toolz import dissoc
# First, lets create some dummy collection
client = pymongo.MongoClient()
client["foo"]["bar"].insert([{"k": "foo", "v": 1}, {"k": "bar", "v": 2}])
client.close()
config = {
"mongo.input.uri": "mongodb://localhost:27017/foo.bar",
"mongo.output.uri": "mongodb://localhost:27017/foo.barplus"
}
# Read data from MongoDB
rdd = sc.newAPIHadoopRDD(
"com.mongodb.hadoop.MongoInputFormat",
"org.apache.hadoop.io.Text",
"org.apache.hadoop.io.MapWritable",
None, None, config)
# Drop _id field and create data frame
dt = sqlContext.createDataFrame(rdd.map(lambda (k, v): dissoc(v, "_id")))
dt_plus_one = dt.select(dt["k"], (dt["v"] + 1).alias("v"))
(dt_plus_one.
rdd. # Extract rdd
map(lambda row: (None, row.asDict())). # Map to (None, dict) pairs
saveAsNewAPIHadoopFile(
"file:///placeholder", # Ignored
# From org.mongodb.mongo-hadoop:mongo-hadoop-core
"com.mongodb.hadoop.MongoOutputFormat",
None, None, None, None, config))
另請參閱:使Spark,Python和MongoDB協同工作
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.