简体   繁体   中英

How can I consume this Rest API in Azure Data Factory

I have a REST API I need to call from Azure Data Factory and insert the data into a SQL table.

The format of the JSON returned from the API is in the following format:

{
    "serviceResponse": {
        "supportOffice": "EUKO",
        "totalPages": 5,
        "pageNo": 1,
        "recordsPerPage": 1000,
        "projects": [
            { "projectID":1 ...} , { "projectID":2 ...} ,...
        ]
    }
}

the URL is in the format http://server.com/api/Projects?pageNo=1

I have managed to set up a RestService to call the API and return the JSON and a SQL Sink that will take the JSON and pass it to a stored procedure that then stores the data.

However, what I am struggling with is how to handle the pagination.

I have tried:

  1. Pagination options on the RestService: I don't think this will work as it only allows for an XPATH that returns the full next URL. I can't see that it will allow the URL to be computed from the totalPages and pageNo. (or I couldn't get it to work)

  2. I tried to add a Web call to the API before the processing to then calculate the number of pages. While not ideal it did work, until I hit the 1mb/1min limit as some responses are quite big. This is not going to work.

  3. I've tried to see if the API could change, but that is not possible.

I was wondering if anyone has any ideas on how I could get this working, or has succesfully consumed a similar API?

The following explanation will walk through creating a pipeline that looks like the following. Notice it uses Stored Procedure activities, Web Activities, and For Each activities. 管道截图

每个活动截图

First provision an Azure SQL DB, setup the AAD administrator, then grant the ADF MSI permissions in the database as described here . Then create the following table and two stored procedures:

CREATE TABLE [dbo].[People](
    [id] [int] NULL,
    [email] [varchar](255) NULL,
    [first_name] [varchar](100) NULL,
    [last_name] [varchar](100) NULL,
    [avatar] [nvarchar](1000) NULL
)

GO
/*
sample call:
exec uspInsertPeople @json = '{"page":1,"per_page":3,"total":12,"total_pages":4,"data":[{"id":1,"email":"george.bluth@reqres.in","first_name":"George","last_name":"Bluth","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/calebogden/128.jpg"},{"id":2,"email":"janet.weaver@reqres.in","first_name":"Janet","last_name":"Weaver","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/josephstein/128.jpg"},{"id":3,"email":"emma.wong@reqres.in","first_name":"Emma","last_name":"Wong","avatar":"https://s3.amazonaws.com/uifaces/faces/twitter/olegpogodaev/128.jpg"}]}'
*/
create proc uspInsertPeople @json nvarchar(max)
as
begin
insert into People (id, email, first_name, last_name, avatar)
select d.*
from OPENJSON(@json)
WITH (
        [data] nvarchar(max) '$.data' as JSON
)
CROSS APPLY OPENJSON([data], '$')
    WITH (
        id int '$.id',
        email varchar(255) '$.email',
        first_name varchar(100) '$.first_name',
        last_name varchar(100) '$.last_name',
        avatar nvarchar(1000) '$.avatar'
    ) d;
end

GO

create proc uspTruncatePeople
as
truncate table People


Next, in Azure Data Factory v2 create a new pipeline, rename it to ForEachPage then go to the Code view and paste in the following JSON:

{
    "name": "ForEachPage",
    "properties": {
        "activities": [
            {
                "name": "GetTotalPages",
                "type": "WebActivity",
                "dependsOn": [
                    {
                        "activity": "Truncate SQL Table",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "url": {
                        "value": "https://reqres.in/api/users?page=1",
                        "type": "Expression"
                    },
                    "method": "GET"
                }
            },
            {
                "name": "ForEachPage",
                "type": "ForEach",
                "dependsOn": [
                    {
                        "activity": "GetTotalPages",
                        "dependencyConditions": [
                            "Succeeded"
                        ]
                    }
                ],
                "userProperties": [],
                "typeProperties": {
                    "items": {
                        "value": "@range(1,activity('GetTotalPages').output.total_pages)",
                        "type": "Expression"
                    },
                    "activities": [
                        {
                            "name": "GetPage",
                            "type": "WebActivity",
                            "dependsOn": [],
                            "policy": {
                                "timeout": "7.00:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "url": {
                                    "value": "@concat('https://reqres.in/api/users?page=',item())",
                                    "type": "Expression"
                                },
                                "method": "GET"
                            }
                        },
                        {
                            "name": "uspInsertPeople stored procedure",
                            "type": "SqlServerStoredProcedure",
                            "dependsOn": [
                                {
                                    "activity": "GetPage",
                                    "dependencyConditions": [
                                        "Succeeded"
                                    ]
                                }
                            ],
                            "policy": {
                                "timeout": "7.00:00:00",
                                "retry": 0,
                                "retryIntervalInSeconds": 30,
                                "secureOutput": false,
                                "secureInput": false
                            },
                            "userProperties": [],
                            "typeProperties": {
                                "storedProcedureName": "[dbo].[uspInsertPeople]",
                                "storedProcedureParameters": {
                                    "json": {
                                        "value": {
                                            "value": "@string(activity('GetPage').output)",
                                            "type": "Expression"
                                        },
                                        "type": "String"
                                    }
                                }
                            },
                            "linkedServiceName": {
                                "referenceName": "lsAzureDB",
                                "type": "LinkedServiceReference"
                            }
                        }
                    ]
                }
            },
            {
                "name": "Truncate SQL Table",
                "type": "SqlServerStoredProcedure",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "storedProcedureName": "[dbo].[uspTruncatePeople]"
                },
                "linkedServiceName": {
                    "referenceName": "lsAzureDB",
                    "type": "LinkedServiceReference"
                }
            }
        ],
        "annotations": []
    }
}

Create a lsAzureDB linked service to Azure SQL DB setting it to use the MSI for authentication.

This pipeline calls a sample paged API (which works at the moment but it not an API I manage so may stop working at some point) to demonstrate how to loop and how to take the results of the Web Activities and insert them to a SQL table via a stored procedure call and JSON parsing in the stored procedure. The loop will run with parallelism but certainly you could change settings on the ForEachPage activity to make it run in serial.

This approach doesn't work for several reasons however the main issue is that Pipeline "Copy Data" Activity is unable to index into the deeply nested arrays.

I can wild card the first level of array but anything deeper requires and actual integer index value. As long as there is only one item in the array it's great after that however we would be missing data.

 {
   "source": {
      "path": "$['myObject']['element'][*]['externalUID'][0]['provider']"
    },
    sink": {
       name": "EXTERNALUID_PROVIDER"
    }
},

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM