I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system.
Eg, there might be 3 tables to hold sign-in information:
ClientASignins ClientBSignins ClientCSignins
Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script?
Or is there another way to achieve this result?
Many thanks!
If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. For example:
foreach($t in $tableList) {
$hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
TBLPROPERTIES(
""azure.table.name""=""$($t.tableName)"",
""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
-Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}
In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is:
I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts.
References:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.