简体   繁体   English

文件系统 SDK 与 Azure 数据工厂

[英]Filesystem SDK vs Azure Data Factory

I'm very new to the Azure Data Lake Storage and currently training on Data Factory.我对 Azure 数据湖存储非常陌生,目前正在接受数据工厂培训。 I have a developer background so right away I'm not a fan of the 'tools' approach for development.我有开发人员背景,所以我不喜欢“工具”开发方法。 I really don't like how there's all these settings to set and objects you have to create everywhere.我真的不喜欢如何设置所有这些设置以及您必须在任何地方创建的对象。 I much prefer a code approach which allows us to detach the logic from the service (don't like the publishing thing to save), see everything by scrolling or navigate to different objects in a project, see differences easier in source control and etc. So I found this Micrososft's Filesystem SDK that seems to be an alternative to Data Factory: https://azure.microsoft.com/en-us/blog/filesystem-sdks-for-azure-data-lake-storage-gen2-now-generally-available/我更喜欢一种代码方法,它允许我们从服务中分离逻辑(不喜欢保存发布的东西),通过滚动或导航到项目中的不同对象来查看所有内容,在源代码控制等方面更容易看到差异。所以我发现这个微软的文件系统 SDK 似乎可以替代数据工厂: https://azure.microsoft.com/en-us/blog/filesystem-sdks-for-azure-data-lake-storage-gen2-now -一般可用/

What has been your experience using this approach?您使用这种方法的经验是什么? Is this a good alternative?这是一个很好的选择吗? Is there a way to run SDK code in data factory?有没有办法在数据工厂中运行 SDK 代码? that way we can leverage scheduling and triggers?这样我们就可以利用调度和触发器? I guess i'm looking for Pros/cons.我想我正在寻找优点/缺点。

thank you谢谢你

Well, the docs refer to several SDKs, one of them being the.Net SDK and the title is好吧, 文档引用了几个 SDK,其中之一是.Net SDK ,标题是

Use .NET ( or Python or Java etc. ) to manage directories, files, and ACL s in Azure Data Lake Storage Gen2 Use .NET ( or Python or Java etc. ) to manage directories, files, and ACL s in Azure Data Lake Storage Gen2

So, the SDK lets you manage the filesystem only.因此,SDK 只允许您管理文件系统。 No support for triggers, pipelines, dataflows and the lot.不支持触发器、管道、数据流等。 You will have to stick to the Azure Data Factory for that.为此,您必须坚持使用 Azure 数据工厂。

Regarding this:关于这一点:

I'm not a fan of the 'tools' approach for development我不喜欢“工具”开发方法

I hate to tell you but the world is moving that way whether you like it or not.我不想告诉你,但不管你喜不喜欢,世界都在朝着那个方向发展。 Take Logic Apps for example.以逻辑应用为例。 Azure Data Factory isn't aimed at the hardcore developer but fulfils a need for people working with large sets of data like Data Engineers. Azure 数据工厂不是针对核心开发人员,而是满足像数据工程师这样处理大量数据的人的需求。 I am already glad it integrates with git very well.我已经很高兴它与 git 很好地集成在一起。 Yes, there is some overhead in defining sinks and sources but they are reusable across pipelines.是的,定义接收器和源有一些开销,但它们可以跨管道重用。

If you really want to use code try Azure Databricks .如果您真的想使用代码,请尝试Azure Databricks Take a look at this Q&A as well.看看这个问答

TL;DR : The FileSystem SDK is not an alternative. TL;DR :文件系统 SDK 不是替代品。

The code-centric alternative to Azure Data Factory for building and managing your Azure Data Lake is Spark.用于构建和管理 Azure 数据湖的 Azure 数据工厂的以代码为中心的替代方案是 Spark。 Typically either Azure Databricks or Azure Synapse Spark .通常是Azure DatabricksAzure Synapse Spark

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM