[英]Azure Data Factory Pipeline + ML
I am trying to do a pipeline in Azure Data factory V1 which will do an Azure Batch Execution on a file. 我正在尝试在Azure数据工厂V1中执行管道,该管道将对文件执行Azure批处理执行。 I implemented it using a blob storage as input and output and it worked.
我使用blob存储作为输入和输出来实现它并且它工作。 However, I am not trying to change the input and output to a folder in my data lake store.
但是,我不是要将输入和输出更改为我的数据湖存储中的文件夹。 When I try to deploy it, it gives me the following error:
当我尝试部署它时,它给我以下错误:
Entity provisioning failed: AzureML Activity 'MLActivity' specifies 'DatalakeInput' in a property that requires an Azure Blob Dataset reference.
How can I have the input and output as a datalakestore instead of a blob? 如何将输入和输出作为datalakestore而不是blob?
Pipeline: 管道:
{
"name": "MLPipeline",
"properties": {
"description": "use AzureML model",
"activities": [
{
"type": "AzureMLBatchExecution",
"typeProperties": {
"webServiceInput": "DatalakeInput",
"webServiceOutputs": {
"output1": "DatalakeOutput"
},
"webServiceInputs": {},
"globalParameters": {}
},
"inputs": [
{
"name": "DatalakeInput"
}
],
"outputs": [
{
"name": "DatalakeOutput"
}
],
"policy": {
"timeout": "02:00:00",
"concurrency": 3,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "MLActivity",
"description": "description",
"linkedServiceName": "MyAzureMLLinkedService"
}
],
"start": "2016-02-08T00:00:00Z",
"end": "2016-02-08T00:00:00Z",
"isPaused": false,
"hubName": "hubname",
"pipelineMode": "Scheduled"
}
}
Output dataset: 输出数据集:
{
"name": "DatalakeOutput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "AzureDataLakeStoreLinkedService",
"typeProperties": {
"folderPath": "/DATA_MANAGEMENT/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Input dataset: 输入数据集:
{
"name": "DatalakeInput",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "AzureDataLakeStoreLinkedService",
"typeProperties": {
"fileName": "data.csv",
"folderPath": "/RAW/",
"format": {
"type": "TextFormat",
"columnDelimiter": ","
}
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
AzureDatalakeStoreLinkedService: AzureDatalakeStoreLinkedService:
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"description": "",
"hubName": "xyzdatafactoryv1_hub",
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://xyzdatastore.azuredatalakestore.net/webhdfs/v1",
"authorization": "**********",
"sessionId": "**********",
"subscriptionId": "*****",
"resourceGroupName": "xyzresourcegroup"
}
}
}
The linked service was done following this tutorial based on data factory V1. 链接服务是在本教程基于数据工厂V1完成的。
I assume there is some issue with AzureDataLakeStoreLinkedService. 我假设AzureDataLakeStoreLinkedService存在一些问题。 Please verify.
请验证。
Depending on the authentication used for access data store, your AzureDataLakeStoreLinkedService json must look like below - 根据用于访问数据存储的身份验证,AzureDataLakeStoreLinkedService json必须如下所示 -
Using service principal authentication 使用服务主体认证
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
"servicePrincipalId": "<service principal id>",
"servicePrincipalKey": {
"type": "SecureString",
"value": "<service principal key>"
},
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
"subscriptionId": "<subscription of ADLS>",
"resourceGroupName": "<resource group of ADLS>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
Using managed service identity authentication 使用托管服务身份验证
{
"name": "AzureDataLakeStoreLinkedService",
"properties": {
"type": "AzureDataLakeStore",
"typeProperties": {
"dataLakeStoreUri": "https://<accountname>.azuredatalakestore.net/webhdfs/v1",
"tenant": "<tenant info, e.g. microsoft.onmicrosoft.com>",
"subscriptionId": "<subscription of ADLS>",
"resourceGroupName": "<resource group of ADLS>"
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
This is Microsoft Document for reference - Copy data to or from Azure Data Lake Store by using Azure Data Factory 这是Microsoft Document供参考 - 使用Azure Data Factory将数据复制到Azure Data Lake Store或从Azure Data Store复制数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.