简体   繁体   中英

Data ingestion to snowflake from Azure data factory

Question: Can anyone help me to find a solution for ingesting data from Azure Data factory to snowflake table without using azure blob storage.

Requirements: We got a set of customer IDs stored in snowflake table right now.We want to iterate through each of the customer id and fetch all customer details from Amazon S3 using WebAPI and write it back to snowflake table. The current system uses Azure Databricks(PySpark) to POST customer id and GET related json data from S3 using WebAPI,parse json to extract our required info and write it back to snowflake. But this process takes at least 3 seconds for a single record and we cannot afford to spend that much time for data ingestion as we have large data volume to process and running ADB cluster for long time cost more. The solution we think is like instead of using python Web API,we can use azure data factory to get data from s3 bucket and ingest it to snowflake table. Since the data is customer data,we are not suppose to store that in azure blob storage before writing it to snowflake due to privacy rules.Do we have any other method that can be used to write it to snowflake table directly from s3 or through ADF without using blob storage.

You can create a databricks notebook and read all data from s3 and for temp purpose store the data on dbfs which will be destroyed as soon as the cluster terminates.

ADF -> Databricks Notebook

Databricks
Read from s3 -> create a pyspark dataframe -> filter the data based on your condition -> write to snowflake

Well, if your data is already on S3 you can just use the COPY INTO command. https://docs.snowflake.com/en/user-guide/data-load-s3.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM