简体   繁体   中英

Build a pipeline in azure data factory to load Excel files, format content, transform in csv and send to azure sql DB

I'm approaching to Azure environment and watching tutorials/reading documents, but I'm trying to figure out how to setup a flow that enables the process that I will describe hereunder. The starting point are reports in.xlsx format produced monthly by Mktg Dept: the requirements are to bring them in Azure SQL DB so that data can be stored and analysed. Sofar I managed to put those files (previously manually converted in.csv format) in a BLOB storage and build an ADF pipeline that copy each file in a table on the SQL DB. The problem is that as far as I understood with ADF it's not possible to directly manage xlsx files, and I'm wondering how to set up an automated procedure that enables the conversion from.xlsx to.csv and save them on BLOB storage. I was thinking about adding to the pipeline a python script/Databricks notebook to convert format, but I'm not sure this could be the best solution. Any hint/reference to existing tutorial or resources would be very appreciated

I found a tutorial which uses Logic Apps to do the conversion.

Datanovice indirectly suggested using a Custom activity to run either a C# or Python application to do the conversion for you.

The least expensive solution would be to do the conversion before uploading to blob, like Datanovice said.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM