[英]Build a pipeline in azure data factory to load Excel files, format content, transform in csv and send to azure sql DB
I'm approaching to Azure environment and watching tutorials/reading documents, but I'm trying to figure out how to setup a flow that enables the process that I will describe hereunder.我正在接近 Azure 环境并观看教程/阅读文档,但我试图弄清楚如何设置一个流程来启用我将在下面描述的过程。 The starting point are reports in.xlsx format produced monthly by Mktg Dept: the requirements are to bring them in Azure SQL DB so that data can be stored and analysed.起点是 Mktg Dept 每月生成的.xlsx 格式的报告:要求将它们带入 Azure SQL DB 以便可以存储和分析数据。 Sofar I managed to put those files (previously manually converted in.csv format) in a BLOB storage and build an ADF pipeline that copy each file in a table on the SQL DB.到目前为止,我设法将这些文件(之前手动转换为.csv 格式)放入 BLOB 存储中,并构建了一个 ADF 管道,该管道将每个文件复制到 SQL DB 上的表中。 The problem is that as far as I understood with ADF it's not possible to directly manage xlsx files, and I'm wondering how to set up an automated procedure that enables the conversion from.xlsx to.csv and save them on BLOB storage.问题是,据我了解,使用 ADF 无法直接管理 xlsx 文件,我想知道如何设置一个自动化程序来实现从.xlsx 到.csv 的转换并将它们保存在 BLOB 存储中。 I was thinking about adding to the pipeline a python script/Databricks notebook to convert format, but I'm not sure this could be the best solution.我正在考虑将 python 脚本/Databricks 笔记本添加到管道中以转换格式,但我不确定这是否是最佳解决方案。 Any hint/reference to existing tutorial or resources would be very appreciated对现有教程或资源的任何提示/参考将不胜感激
I found a tutorial which uses Logic Apps to do the conversion.我找到了一个使用逻辑应用程序进行转换的教程。
Datanovice indirectly suggested using a Custom activity to run either a C# or Python application to do the conversion for you. Datanovice 间接建议使用自定义活动来运行C#或Python应用程序为您进行转换。
The least expensive solution would be to do the conversion before uploading to blob, like Datanovice said.最便宜的解决方案是在上传到 blob 之前进行转换,就像 Datanovice 说的那样。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.