简体   繁体   English

Azure数据工厂的Python自定义活动

[英]Python Custom Activity for Azure-Data Factory

I am trying to create a Data Factory that once a week copies and process large blob files (The Source) to a SQL database (The Sink) in python - by reading the input data set line by line, extracting an ID - using that ID to do a lookup on CosmosDB to get an additional piece of data recomposing the output dataset and writing to the sink. 我正在尝试创建一个数据工厂,该工厂每周一次将大型blob文件(源)复制并处理到python中的SQL数据库(接收器)中-通过逐行读取输入数据集,提取一个ID-使用该ID在CosmosDB上进行查找以获取其他数据,以重组输出数据集并写入接收器。 I have a python script that does this once off (ie reads the entire blob every time) without ADF but am now wanting use the scheduling features on ADF to automate this. 我有一个Python脚本,可以在没有ADF的情况下一次性执行此操作(即每次读取整个Blob),但是现在想使用ADF上的调度功能来自动执行此操作。

Is there a way of creating a custom copy activity in Python that I can inject my current code logic into. 有没有一种方法可以在Python中创建自定义复制活动,可以将当前的代码逻辑注入其中。 Azure currently only documents .Net custom activities ( https://docs.microsoft.com/en-us/azure/data-factory/transform-data-using-dotnet-custom-activity ) which does not fit into my stack. Azure当前仅记录不适合我的堆栈的.Net自定义活动( https://docs.microsoft.com/zh-cn/azure/data-factory/transform-data-using-dotnet-custom-activity )。

The python azure SDK doesn't currently have any documentation on creating custom activity. python azure SDK当前没有任何有关创建自定义活动的文档。

If you look at the example, you see that you can run an executable on the node. 如果查看该示例,则会看到可以在该节点上运行可执行文件。

     "typeProperties": {
          "command": "helloworld.exe",
          "folderPath": "customactv2/helloworld",
          "resourceLinkedService": {
            "referenceName": "StorageLinkedService",
            "type": "LinkedServiceReference"
          }
        }

Further down, in the differences between v1 & v2 they show just running "cmd". 再往下,在v1和v2之间的差异中,它们显示仅在运行“ cmd”。

cmd /c echo hello world

So if you can create an executable to kick off your python code, it might just work. 因此,如果您可以创建一个可执行文件来启动python代码,那么它可能就可以了。 You can also use parameters. 您也可以使用参数。 However, the code will be run on Azure Batch, which provisions a VM for you. 但是,该代码将在Azure Batch上运行,Azure Batch将为您配置VM。 This VM might not have all the dependecies that you need. 此VM可能没有您需要的所有依赖项。 You'll have to create a "portable" package for this to work. 您必须创建一个“便携式”程序包才能使其工作。 Maybe this post can help you with that. 也许这篇文章可以帮助您。

A bit more classy would be to trigger Azure Functions with a web activity. 更为优雅的一点是通过Web活动触发Azure Functions。 But is seems to be quite bèta stuff: https://ourwayoflyf.com/running-python-code-on-azure-functions-app/ 但是似乎是很多东西: https ://ourwayoflyf.com/running-python-code-on-azure-functions-app/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM