[英]Dynamic selection of storage table in azure data factory
I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system. 我有一组现有的azure存储表,每个客户端一个,可以在多租户云系统中保存事件。
Eg, there might be 3 tables to hold sign-in information: 例如,可能有3个表可保存登录信息:
ClientASignins ClientBSignins ClientCSignins ClientASignins ClientBSignins ClientCSignins
Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script? 有没有一种方法可以在复制操作中或在Pig脚本中动态循环这些?
Or is there another way to achieve this result? 还是有另一种方式来达到这个结果?
Many thanks! 非常感谢!
If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. 如果您在其他位置(例如Azure存储)跟踪这些表,则可以使用PowerShell遍历每个表并在每个表上创建一个配置单元表。 For example: 例如:
foreach($t in $tableList) {
$hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
TBLPROPERTIES(
""azure.table.name""=""$($t.tableName)"",
""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
-Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}
Research: http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx 研究: http : //blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx
How can manage Azure Table with Powershell? 如何使用Powershell管理Azure表?
In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is: 最后,我选择了一些用c#编写的Azure Data Factory自定义活动,现在我的工作流程是:
I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts. 我这样做是为了使管道尽可能简单,并消除了重复管道/脚本的需要。
References: 参考文献:
Use Custom Activities In Azure Data Factory pipeline 在Azure数据工厂管道中使用自定义活动
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.