简体   繁体   English

云数据融合与 Dataproc

[英]Cloud Data Fusion vs Dataproc

Cloud Data Fusion offers the ability to create ETL jobs using their graphical pipeline UI representation whereas Dataproc lets us run previously created Spark/Hadoop/Hive jobs. Cloud Data Fusion 提供了使用其图形管道 UI 表示创建 ETL 作业的能力,而 Dataproc 允许我们运行之前创建的 Spark/Hadoop/Hive 作业。

With my limited experience in both these services, I have found Cloud Data Fusion to be the easier of the two to use & manage.由于我在这两种服务方面的经验有限,我发现 Cloud Data Fusion 是两者中更容易使用和管理的。 I would like to know the use cases in which creating & running jobs in Dataproc is preferred over Cloud Data Fusion.我想知道在 Dataproc 中创建和运行作业优于 Cloud Data Fusion 的用例。

You asked for an opinion, so your question should be closed...你征求了意见,所以你的问题应该被关闭......

Anyway, it mainly depends on what you prefer, If you are a developer, and you want to handle, manage, customize/tweak all the steps your pipeline for performance, observability or security reason, code.无论如何,这主要取决于您的喜好,如果您是一名开发人员,并且您希望出于性能、可观察性或安全原因处理、管理、自定义/调整管道的所有步骤,请编写代码。 and Dataproc is better for you. Dataproc 更适合您。 Same reason if all your developers already know the Hadoop ecosystem.如果您的所有开发人员都已经知道 Hadoop 生态系统,则原因相同。

If you prefer to focus on the data transformation/wrangling with low/no code solution, Data fusion is for you.如果您更喜欢使用低代码/无代码解决方案专注于数据转换/争论,那么数据融合适合您。 Especially if you have a few or no skills in development (business users).特别是如果您没有或只有很少的开发技能(业务用户)。

At the end, all the pipeline will run on Dataproc.最后,所有流水线都将在 Dataproc 上运行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM