简体   繁体   English

使用Google Cloud SQL或MongoDB作为Dataflow / Dataproc的输入

[英]Use Google Cloud SQL or MongoDB as a input for Dataflow/ Dataproc

I am planning to prepare a server-less data pipeline with Google Cloud Platform. 我打算使用Google Cloud Platform准备无服务器数据管道。 My plan is to use Dataflow/ Dataproc for batch processing data from three different sources. 我的计划是使用Dataflow / Dataproc对来自三个不同来源的数据进行批处理。

My input sources are: 我的输入来源是:

  1. Cloud SQL (MySQL) Cloud SQL(MySQL)
  2. Cloud SQL (PostgreSQL) Cloud SQL(PostgreSQL)
  3. MongoDB MongoDB的

But after reading their documentation I got they don't have any input for cloud SQL or MongoDB. 但是在阅读他们的文档后,我得到了他们对云SQL或MongoDB的任何输入。

Also I have checked their custom driver section but this is only for Java, but I am planning to use Python. 另外,我已经检查了他们的自定义驱动程序部分,但这仅适用于Java,但我打算使用Python。

Is there any idea how I can ingest those 3 different sources with Data Flow/ Dataproc ? 有什么想法可以使用Data Flow / Dataproc来摄取这3种不同的来源吗?

In your situation I think the best option is to use Dataproc. 在您的情况下,我认为最好的选择是使用Dataproc。 Whenever it is going to be batch processing. 每当将要进行批处理时。

This way you can use Hadoop or Spark and you can have more control over the workflow. 这样,您可以使用Hadoop或Spark,并且可以更好地控制工作流程。

You can use Python code with Spark. 您可以将Python代码与Spark一起使用。 {1} {1}

You can do SQL queries with Spark. 您可以使用Spark进行SQL查询。 {2} {2}

There is also a connector for MongoDB and Spark. 还有一个用于MongoDB和Spark的连接器。 {3} {3}

And a connector for MongoDB and Hadoop. 以及用于MongoDB和Hadoop的连接器。 {4} {4}

{1}: https://spark.apache.org/docs/0.9.0/python-programming-guide.html {1}: https//spark.apache.org/docs/0.9.0/python-programming-guide.html

{2}: https://spark.apache.org/docs/latest/sql-programming-guide.html {2}: https//spark.apache.org/docs/latest/sql-programming-guide.html

{3}: https://docs.mongodb.com/spark-connector/master/ {3}: https//docs.mongodb.com/spark-connector/master/

{4}: https://docs.mongodb.com/ecosystem/tools/hadoop/ {4}: https//docs.mongodb.com/ecosystem/tools/hadoop/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM