I am finding the safest way to import several dimension and fact tables from SQL Server to Azure Data Lake Gen 2. This is what I found:
Option 1 : Azure Data Factory This involves a cost and therefore not preferable solution for me at the moment.
Option 2 : Python from Azure Databricks
2a) Apache Spark Connector
jdbcDF = spark.read \
.format("com.microsoft.sqlserver.jdbc.spark") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password).load()
2b) Built-in JDBC Spark SQL Connector
2c) ODBC driver and pyodbc package
2d) pymssql package
2e) JayDeBeApi
Option 3: SSIS package
I am not sure which of these I should use. What are the pros and cons of the above approaches?
Once I read the data into a data frame using one of the above approaches, how do I save them to the Data Lake Gen2 storage?
We can use the jdbc driver to read the data into a data frame. For safety reasons, we can store the connection string info as secrets in Azure Keyvault or using databricks secrets via the CLI.
We can save them as parquet files and upload to the ADL v2 via jdbcDF.write.parquet('dbfs:/path',mode='overwrite')
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.