[英]How do you automate the creation of a dataset within the .NET SDK for Azure Data Factory?
I am using Microsoft Azure Data Factory .NET SDK in order to automate dataset creation for a large number of tables. 我使用Microsoft Azure数据工厂.NET SDK来自动创建大量表的数据集。
A method within my .NET console application provides me the ability to create input and output datasets, based on a specified table name: .NET控制台应用程序中的一种方法使我能够基于指定的表名创建输入和输出数据集:
createInputDataSet(string table_Name, DataFactoryManagementClient client) {
client.Datasets.CreateOrUpdate(resourceGroupName, dataFactoryName,
new DatasetCreateOrUpdateParameters()
{
Dataset = new Dataset()
{
Properties = new DatasetProperties()
{
Structure = new List<DataElement>()
{
//TODO: Autogenerate columns and types
new DataElement() {Name = "name", Type = "String" },
new DataElement() {Name = "date", Type = "Datetime" }
}
}...
Currently, dataset creation is accomplished through a stored procedure on either source SQL Server or target SQL Data Warehouse. 当前,数据集的创建是通过源SQL Server或目标SQL数据仓库上的存储过程完成的。 The stored procedure specifies a table name and then looks into
INFORMATION_SCHEMA
in order to generate valid columns and types for each ADF dataset. 该存储过程指定一个表名,然后查看
INFORMATION_SCHEMA
,以便为每个ADF数据集生成有效的列和类型。 We then manually copy the result into portal.azure.com. 然后,我们将结果手动复制到portal.azure.com。
We have over 600 datasets, so need to utilize the .NET SDK for automated copy to ADF. 我们有600多个数据集,因此需要利用.NET SDK来自动复制到ADF。
How does one create datasets automatically, while taking into account that each dataset's structure (ie columns and types) will differ? 考虑到每个数据集的结构(即列和类型)会有所不同,如何自动创建数据集?
The only way I've been able to accomplish this is by writing a stored procedure to generate column names and types on both source and target. 我能够做到这一点的唯一方法是编写一个存储过程,以在源和目标上生成列名和类型。 Such stored procedure should call
INFORMATION_SCHEMA
and INFORMATION_SCHEMA.COLUMNS
in order to generate each column and type for the inputted table. 此类存储过程应调用
INFORMATION_SCHEMA
和INFORMATION_SCHEMA.COLUMNS
,以便为输入的表生成每个列和类型。
Once the procedure adequately outputs two columns (name, type) programmatically call the procedure and save as follow: 一旦过程充分输出了两列(名称,类型),则以编程方式调用该过程并保存如下:
List<DataElement> InputParams = new List<DataElement>();
SqlConnection connect = new SqlConnection(<connection_string>);
SqlCommand cmd = new SqlCommand("pUtil_GenDFAutomate", connect);
cmd.CommandType = CommandType.StoredProcedure;
cmd.Parameters.Add(new SqlParameter("@TableName", <table_name>));
using (var reader = cmd.ExecuteReader())
{
if (reader.HasRows)
{
while (reader.Read())
{
var name = reader.GetString(0);
var type = reader.GetString(1);
InputParams.Add(new DataElement
{
Name = name,
Type = type
});
}
reader.Close();
}
}
Then, upon creation of your input/output dataset, simply use the variable InputParams
as follow: 然后,在创建输入/输出数据集后,只需使用变量
InputParams
,如下所示:
new DatasetCreateOrUpdateParameters()
{
Dataset = new Dataset()
{
Properties = new DatasetProperties()
{
Structure = InputParams
//Etc.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.