简体   繁体   中英

Ways to import data from SQL Server 2016 to Azure Data Lake Gen 2

I am finding the safest way to import several dimension and fact tables from SQL Server to Azure Data Lake Gen 2. This is what I found:

Option 1 : Azure Data Factory This involves a cost and therefore not preferable solution for me at the moment.

Option 2 : Python from Azure Databricks

2a) Apache Spark Connector

jdbcDF = spark.read \
        .format("com.microsoft.sqlserver.jdbc.spark") \
        .option("url", url) \
        .option("dbtable", table_name) \
        .option("user", username) \
        .option("password", password).load()

2b) Built-in JDBC Spark SQL Connector

2c) ODBC driver and pyodbc package

2d) pymssql package

2e) JayDeBeApi

Option 3: SSIS package

I am not sure which of these I should use. What are the pros and cons of the above approaches?

Once I read the data into a data frame using one of the above approaches, how do I save them to the Data Lake Gen2 storage?

  1. We can use the jdbc driver to read the data into a data frame. For safety reasons, we can store the connection string info as secrets in Azure Keyvault or using databricks secrets via the CLI.

  2. We can save them as parquet files and upload to the ADL v2 via jdbcDF.write.parquet('dbfs:/path',mode='overwrite') .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM