简体   繁体   中英

Dynamic selection of storage table in azure data factory

I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system.

Eg, there might be 3 tables to hold sign-in information:

ClientASignins ClientBSignins ClientCSignins

Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script?

Or is there another way to achieve this result?

Many thanks!

If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. For example:

foreach($t in $tableList) {
    $hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
 STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
 TBLPROPERTIES(
  ""azure.table.name""=""$($t.tableName)"",
  ""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
  ""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
  -Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}

Research: http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx

How can manage Azure Table with Powershell?

In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is:

  1. Custom activity: aggregate the data for the current slice into a single blob file for analysis in Pig.
  2. HDInsight: Analyse with Pig
  3. Custom activity: disperse the data to the array of target tables from blob storage to table storage.

I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts.

References:

Use Custom Activities In Azure Data Factory pipeline

HttpDataDownloader Sample

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM