简体   繁体   中英

Databricks + ADF + ADLS2 + Hive = Azure Synapse

I have no experience with Azure Synapse but my understanding is that is the same as Databricks, ADF, ADLS2 and Hive in SQL DWH, all together in one workspace with a different name.

Am I wrong?

Yes, in many context Azure Synapse and Databricks provide the same Big Data Analytics approach but there are also few differences between these services.

With the new functionalities in Synapse now, we see some similar functionalities as in Databricks (eg Spark, Delta) which raises the question on how Synapse compares to Databricks and when to use which.

  • Yes, both have Spark but…

    • Databricks

      • has a proprietary data processing engine (Databricks Runtime) built on a highly optimized version of Apache Spark offering 50x performance
      • already has support for Spark 3.0
      • allows users to opt for GPU enabled clusters and choose between standard and high-concurrency cluster mode
    • Synapse

      • Open-source Apache Spark (thus not including all features of Databricks Runtime)
      • has built-in support for .NET for Spark applications
  • Yes, both have notebooks

    • Synapse

      • Nteract Notebooks

      • has co-authoring of Notebooks, but one person needs to save the Notebook before another person sees the change

      • doesn't have automated versioning

    • Databricks

      • Databricks Notebooks

      • Has real-time co-authoring (both authors see the changes in real-time) Automated versioning

  • Yes, both can access data from a data lake

    • Synapse

      • When creating Synapse, you can select a data lake which will be your primary data lake (can query it directly from the scripts and notebooks)
    • Databricks

      • You need to mount a data lake before using it
  • Yes, both leverage Delta

    • Synapse

      • Delta Lake is open source
    • Databricks

      • Has Databricks Delta which is built on the open source but offers some extra optimizations
  • No, they are not the same

    • Synapse

      • Has both a traditional SQL engine (to fit the traditional BI developers) as well as a Spark engine (to fit data scientists, analysts & engineers)

      • Is a data warehouse (ie Synapse Analytics) + an interface tool (ie Synapse Studio)

    • Databricks

      • Is not a data warehouse tool but rather a Spark-based notebook tool Has a focus on Spark, Delta Engine, MLflow and MLR
  • No, they don't offer the same developer experience

    • Synapse

      • Offers for Spark-development a developer experience currently only through Synapse Studio (not through local IDEs)

      • Doesn't have Git yet integrated within the Synapse Studio Notebooks

    • Databricks

      • Offers a developer experience within Databricks UI, Databricks Connect (ie remote connect from Visual Studio Code, Pycharm, etc.) and soon Jupyter & RStudio UI within Databricks

Check When to use Synapse and when Databricks? .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM