简体   繁体   English

Azure数据工厂管道+ ML

[英]Azure Data Factory Pipeline + ML

I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. 我正在尝试在Azure数据工厂V1中执行管道,该管道将对文件执行Azure批处理执行。 I implemented it using a blob storage as input and output and it worked. 我使用blob存储作为输入和输出来实现它并且它工作。 However, I am not trying to change the input and output to a folder in my data lake store. 但是,我不是要将输入和输出更改为我的数据湖存储中的文件夹。 When I try to deploy it, it gives me the following error: 当我尝试部署它时,它给我以下错误:

Entity provisioning failed: AzureML Activity 'MLActivity' specifies 'DatalakeInput' in a property that requires an Azure Blob Dataset reference.  

How can I have the input and output as a datalakestore instead of a blob? 如何将输入和输出作为datalakestore而不是blob?

Pipeline: 管道:

{
        "name": "MLPipeline",
        "properties": {
            "description": "use AzureML model",
            "activities": [
                {
                    "type": "AzureMLBatchExecution",
                    "typeProperties": {
                        "webServiceInput": "DatalakeInput",
                        "webServiceOutputs": {
                            "output1": "DatalakeOutput"
                        },
                        "webServiceInputs": {},
                        "globalParameters": {}
                    },
                    "inputs": [
                        {
                            "name": "DatalakeInput"
                        }
                    ],
                    "outputs": [
                        {
                            "name": "DatalakeOutput"
                        }
                    ],
                    "policy": {
                        "timeout": "02:00:00",
                        "concurrency": 3,
                        "executionPriorityOrder": "NewestFirst",
                        "retry": 1
                    },
                    "scheduler": {
                        "frequency": "Hour",
                        "interval": 1
                    },
                    "name": "MLActivity",
                    "description": "description",
                    "linkedServiceName": "MyAzureMLLinkedService"
                }
            ],
            "start": "2016-02-08T00:00:00Z",
            "end": "2016-02-08T00:00:00Z",
            "isPaused": false,
            "hubName": "hubname",
            "pipelineMode": "Scheduled"
        }
    }

Output dataset: 输出数据集:

  {
        "name": "DatalakeOutput",
        "properties": {
            "published": false,
            "type": "AzureDataLakeStore",
            "linkedServiceName": "AzureDataLakeStoreLinkedService",
            "typeProperties": {
                "folderPath": "/DATA_MANAGEMENT/"
            },
            "availability": {
                "frequency": "Hour",
                "interval": 1
            }
        }
    }

Input dataset: 输入数据集:

 {
        "name": "DatalakeInput",
        "properties": {
            "published": false,
            "type": "AzureDataLakeStore",
            "linkedServiceName": "AzureDataLakeStoreLinkedService",
            "typeProperties": {
                "fileName": "data.csv",
                "folderPath": "/RAW/",
                "format": {
                    "type": "TextFormat",
                    "columnDelimiter": ","
                }
            },
            "availability": {
                "frequency": "Hour",
                "interval": 1
            }
        }
    }

AzureDatalakeStoreLinkedService: AzureDatalakeStoreLinkedService:

{
    "name": "AzureDataLakeStoreLinkedService",
    "properties": {
        "description": "",
        "hubName": "xyzdatafactoryv1_hub",
        "type": "AzureDataLakeStore",
        "typeProperties": {
            "dataLakeStoreUri": "https://xyzdatastore.azuredatalakestore.net/webhdfs/v1",
            "authorization": "**********",
            "sessionId": "**********",
            "subscriptionId": "*****",
            "resourceGroupName": "xyzresourcegroup"
        }
    }
}

The linked service was done following this tutorial based on data factory V1. 链接服务是在本教程基于数据工厂V1完成的。

I assume there is some issue with AzureDataLakeStoreLinkedService. 我假设AzureDataLakeStoreLinkedService存在一些问题。 Please verify. 请验证。

Depending on the authentication used for access data store, your AzureDataLakeStoreLinkedService json must look like below - 根据用于访问数据存储的身份验证,AzureDataLakeStoreLinkedService json必须如下所示 -

Using service principal authentication 使用服务主体认证

{
    "name": "AzureDataLakeStoreLinkedService",
    "properties": {
        "type": "AzureDataLakeStore",
        "typeProperties": {
            "dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
            "servicePrincipalId": "<service principal id>",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "<service principal key>"
            },
            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
            "subscriptionId": "<subscription of ADLS>",
            "resourceGroupName": "<resource group of ADLS>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

Using managed service identity authentication 使用托管服务身份验证

{
    "name": "AzureDataLakeStoreLinkedService",
    "properties": {
        "type": "AzureDataLakeStore",
        "typeProperties": {
            "dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
            "tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
            "subscriptionId": "<subscription of ADLS>",
            "resourceGroupName": "<resource group of ADLS>"
        },
        "connectVia": {
            "referenceName": "<name of Integration Runtime>",
            "type": "IntegrationRuntimeReference"
        }
    }
}

This is Microsoft Document for reference - Copy data to or from Azure Data Lake Store by using Azure Data Factory 这是Microsoft Document供参考 - 使用Azure Data Factory将数据复制到Azure Data Lake Store或从Azure Data Store复制数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM