简体   繁体   中英

80 Postgres databases in Azure Data Factory to copy into

I'm using Azure Data factory, I'm using SQLServer as source and Postgres as target. Goal is to copy 30 tables from SQLServer, with transformation, to 30 tables in Postgres. Hard part is I have 80 databases from and to, all with the exact same layout but different data. Its one database per customer so 80 customers each with their own databases.

Linked Services doesn't allow parameters for Postgres. I have one dataset per source and target using parameters for schema and table names. I have one pipeline per table with SQLServer source and Postgres target. I can parameterize the SQLServer source in linked service but not Postgres Problem is how can I copy 80 source databases to 80 target databases without adding 80 target linked services and 80 target datasets? Plus I'd have to repeat all 30 pipelines per target database.

BTW I'm only familiar with the UI, however anything else that does the job is acceptable. Any help would be appreciated.

There is simple way to implement this. Essentially you need to have a single Linked Service, which reads the connection string out of KeyVault. You can then parameterize source and target as keyvault secret names, and easily switch between data sources by just changing the secret name. This relies on all connection related information being enclosed within a single connection string.

I will provide a simple overview for Postgresql, but the same logic applies to MSSQL servers as source.

  1. Implement a Linked Service for Azure Key Vault .
  2. Add a Linked Service for Azure Postgresql that uses Key Vault to store access url in format: Server=your_server_name.postgres.database.azure.com;Database=your_database_name;Port=5432;UID=your_user_name;Password=your_password;SSL Mode=Require;Keepalive=600; (advise to use server name as secret name)
  3. Pass this parameter, which is essentially correct secret name, in the Pipeline (you can also implement a loop that would accept immediately array of x elements, and parse n elements at a time into separate pipeline)

Linked Service Definition for KeyVault:

{
    "name": "your_keyvault_name",
    "properties": {
        "description": "KeyVault",
        "annotations": [],
        "type": "AzureKeyVault",
        "typeProperties": {
            "baseUrl": "https://your_keyvault_name.vault.azure.net/"
        }
    }
}

Linked Service Definition for Postgresql:

{   "name": "generic_postgres_service".
    "properties": {
        "type": "AzurePostgreSql",
        "parameters": {
            "pg_database": {
                "type": "string",
                "defaultValue": "your_database_name"
            }
        },
        "annotations": [],
        "typeProperties": {
            "connectionString": {
                "type": "AzureKeyVaultSecret",
                "store": {
                    "referenceName": "KeyVaultName",
                    "type": "LinkedServiceReference"
                },
                "secretName": "@linkedService().secret_name_for_server"
            }
        },
        "connectVia": {
            "referenceName": "AutoResolveIntegrationRuntime",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Dataset Definition for Postgresql:

{
    "name": "your_postgresql_dataset",
    "properties": {
        "linkedServiceName": {
            "referenceName": "generic_postgres_service",
            "type": "LinkedServiceReference",
            "parameters": {
                "secret_name_for_server": {
                    "value": "@dataset().secret_name_for_server",
                    "type": "Expression"
                }
            }
        },
        "parameters": {
            "secret_name_for_server": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "AzurePostgreSqlTable",
        "schema": [],
        "typeProperties": {
            "schema": {
                "value": "@dataset().schema_name",
                "type": "Expression"
            },
            "table": {
                "value": "@dataset().table_name",
                "type": "Expression"
            }
        }
    }
}

Pipeline Definition for Postgresql:

{
    "name": "your_postgres_pipeline",
    "properties": {
        "activities": [
            {
                "name": "Copy_Activity_1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                ...
                ... i skipped definition
                ...
                "inputs": [
                    {
                        "referenceName": "your_postgresql_dataset",
                        "type": "DatasetReference",
                        "parameters": {
                            "secret_name_for_server": "secret_name"
                        }
                    }
                ]
            }
        ],
        "annotations": []
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM