简体   繁体   English

如何在.NET SDK中为Azure数据工厂自动创建数据集?

[英]How do you automate the creation of a dataset within the .NET SDK for Azure Data Factory?

I am using Microsoft Azure Data Factory .NET SDK in order to automate dataset creation for a large number of tables. 我使用Microsoft Azure数据工厂.NET SDK来自动创建大量表的数据集。

A method within my .NET console application provides me the ability to create input and output datasets, based on a specified table name: .NET控制台应用程序中的一种方法使我能够基于指定的表名创建输入和输出数据集:

createInputDataSet(string table_Name, DataFactoryManagementClient client) {
    client.Datasets.CreateOrUpdate(resourceGroupName, dataFactoryName,
        new DatasetCreateOrUpdateParameters()
        {
            Dataset = new Dataset()
            {
                Properties = new DatasetProperties()
                {
                    Structure = new List<DataElement>()
                    {
                        //TODO: Autogenerate columns and types
                        new DataElement() {Name = "name", Type = "String" },
                        new DataElement() {Name = "date", Type = "Datetime" }
                    }
            }...

Currently, dataset creation is accomplished through a stored procedure on either source SQL Server or target SQL Data Warehouse. 当前,数据集的创建是通过源SQL Server或目标SQL数据仓库上的存储过程完成的。 The stored procedure specifies a table name and then looks into INFORMATION_SCHEMA in order to generate valid columns and types for each ADF dataset. 该存储过程指定一个表名,然后查看INFORMATION_SCHEMA ,以便为每个ADF数据集生成有效的列和类型。 We then manually copy the result into portal.azure.com. 然后,我们将结果手动复制到portal.azure.com。

We have over 600 datasets, so need to utilize the .NET SDK for automated copy to ADF. 我们有600多个数据集,因此需要利用.NET SDK来自动复制到ADF。

How does one create datasets automatically, while taking into account that each dataset's structure (ie columns and types) will differ? 考虑到每个数据集的结构(即列和类型)会有所不同,如何自动创建数据集?

The only way I've been able to accomplish this is by writing a stored procedure to generate column names and types on both source and target. 我能够做到这一点的唯一方法是编写一个存储过程,以在源目标上生成列名和类型。 Such stored procedure should call INFORMATION_SCHEMA and INFORMATION_SCHEMA.COLUMNS in order to generate each column and type for the inputted table. 此类存储过程应调用INFORMATION_SCHEMAINFORMATION_SCHEMA.COLUMNS ,以便为输入的表生成每个列和类型。

Once the procedure adequately outputs two columns (name, type) programmatically call the procedure and save as follow: 一旦过程充分输出了两列(名称,类型),则以编程方式调用该过程并保存如下:

List<DataElement> InputParams = new List<DataElement>();

SqlConnection connect = new SqlConnection(<connection_string>);
SqlCommand cmd = new SqlCommand("pUtil_GenDFAutomate", connect);

cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@TableName", <table_name>));
using (var reader = cmd.ExecuteReader())
{
    if (reader.HasRows)
    {
        while (reader.Read())
        {
            var name = reader.GetString(0);
            var type = reader.GetString(1);

            InputParams.Add(new DataElement
            {
                Name = name,
                Type = type
            });
        }
        reader.Close();
    }
}

Then, upon creation of your input/output dataset, simply use the variable InputParams as follow: 然后,在创建输入/输出数据集后,只需使用变量InputParams ,如下所示:

new DatasetCreateOrUpdateParameters()
{
    Dataset = new Dataset()
    {
        Properties = new DatasetProperties()
        {
            Structure = InputParams 
//Etc.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 .NET 数据工厂 ZF20E3C5E54C0893D36 启用 DeVOPS GIT 存储库在 Azure 数据工厂中发布管道? - How to publish pipeline in Azure Data Factory enabled with DeVOPS GIT repo using .NET Data Factory SDK (C# )? 使用 .net sdk 在 Azure 数据工厂 V2 中重新运行活动 - Rerun activity in Azure Data Factory V2 using .net sdk 如何将 .net 控制台应用程序客户端连接到 azure 数据工厂 - How to connect a .net console app client to azure data factory 如何从 Azure 数据工厂在 Databricks 上运行.Net spark 作业? - How to run .Net spark jobs on Databricks from Azure Data Factory? 在AWS .NET SDK API中,如何指定“使用默认值” - In AWS .NET SDK apis, how do you specify “use default” 使用 .NET 核心 6 和 Azure SDK 创建 PostgreSQL 灵活服务器时出错 - Error on creation of a PostgreSQL flexible server with .NET core 6, and Azure SDK 如何使用C#.net SDK在数据工厂管道中创建kusto活动 - How to create a kusto activity in data factory pipeline using c# .net sdk 如何覆盖自动创建数据网格列标题? - How to override automate creation of data grid column header? 如何在Azure WebJobs中使用.Net TextWriter - How do you use a .Net TextWriter with Azure WebJobs 你是如何制作通用通用工厂的? - How do you make a Generic Generic Factory?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM