简体   繁体   English

在Azure数据工厂中动态选择存储表

[英]Dynamic selection of storage table in azure data factory

I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system. 我有一组现有的azure存储表,每个客户端一个,可以在多租户云系统中保存事件。

Eg, there might be 3 tables to hold sign-in information: 例如,可能有3个表可保存登录信息:

ClientASignins ClientBSignins ClientCSignins ClientASignins ClientBSignins ClientCSignins

Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script? 有没有一种方法可以在复制操作中或在Pig脚本中动态循环这些?

Or is there another way to achieve this result? 还是有另一种方式来达到这个结果?

Many thanks! 非常感谢!

If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. 如果您在其他位置(例如Azure存储)跟踪这些表,则可以使用PowerShell遍历每个表并在每个表上创建一个配置单元表。 For example: 例如:

foreach($t in $tableList) {
    $hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
 STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
 TBLPROPERTIES(
  ""azure.table.name""=""$($t.tableName)"",
  ""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
  ""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
  -Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}

Research: http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx 研究: http//blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx

How can manage Azure Table with Powershell? 如何使用Powershell管理Azure表?

In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is: 最后,我选择了一些用c#编写的Azure Data Factory自定义活动,现在我的工作流程是:

  1. Custom activity: aggregate the data for the current slice into a single blob file for analysis in Pig. 自定义活动:将当前切片的数据汇总到单个Blob文件中,以便在Pig中进行分析。
  2. HDInsight: Analyse with Pig HDInsight:使用猪进行分析
  3. Custom activity: disperse the data to the array of target tables from blob storage to table storage. 自定义活动:将数据分散到从Blob存储到表存储的目标表数组。

I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts. 我这样做是为了使管道尽可能简单,并消除了重复管道/脚本的需要。

References: 参考文献:

Use Custom Activities In Azure Data Factory pipeline 在Azure数据工厂管道中使用自定义活动

HttpDataDownloader Sample HttpDataDownloader示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM