简体   繁体   中英

Azure Data Factory Get Metadata to get blob filenames and transfer them to Azure SQL database table

I am trying to use Get Metadata activity in Azure Data Factory in order to get blob filenames and copy them to Azure SQL database table. I follow this tutorial: https://www.mssqltips.com/sqlservertip/6246/azure-data-factory-get-metadata-example/

Here is my pipeline, Copy Data > Source is the source destination of the blob files in my Blob storage. I need to specify my source file as binary because they are *.jpeg files.

获取元数据0

For my Copy Data > Sink, its the Azure SQL database, I enable the option "Auto Create table"

获取元数据1

In my Sink dataset config, I had to choose one table because the validation won't pass if I don't select the table in my SQL database even though this table is not related at all to the blob filenames that I want to get.

获取元数据2

Question 1: Am I supposed to create a new table in SQL DB before to have the columns matching the blob filenames that I want to extract?

Then, I tried to validate the pipeline and I get this error.

Copy_Data_1
Sink must be binary when source is binary dataset.

获取元数据3

Question 2: How can I resolve this error? I had to select the file type of the source as binary as it's one of the step when creating source dataset. Therefore, when I choose sink dataset that is Azure SQL table, I didn't have to select the type of dataset so it doesn't seem to match.

Thank you very much in advance.

New screenshot of the new pipeline, I can now get itemName of filenames in the json output files.

获取文件名1

Now I add Copy Data activity just after Get_File_Name2 activity and connect them together to try to get the json output files as source dataset.

获取文件名2

However, I need to choose the source dataset location first before specify type as json. But, as far as I understand these output json files are the output from Get_File_Name2 activity and they are not yet stored on Blob storage. How do I make the copy data activity reading these json output file as source dataset?

获取文件名3

Update 10/14/2020 Here is my new activity stored procedure, I added the parameter as suggested however, I changed the name to JsonData as my stored procedure requires this parameter.

存储过程1

This is my stored procedure.

存储过程2

I get this error at the stored procedure:

{
    "errorCode": "2402",
    "message": "Execution fail against sql server. Sql error number: 13609. Error Message: JSON text is not properly formatted. Unexpected character 'S' is found at position 0.",
    "failureType": "UserError",
    "target": "Stored procedure1",
    "details": []
}

存储过程3

But when I check the input, it seems like it already successfully reading the json string itemName.

存储过程4

But, when I check output, it's not there.

存储过程5

Actually, you may could using Get metadata output json as the parameter and then call the stored procedure: Get metedata --> Stored Procedure !

You just need focus on the coding of the stored procedure.

Get Metadata output childitems :

{
   "childItems": [
        {
            "name": "DeploymentFiles.zip",
            "type": "File"
        },
        {
            "name": "geodatalake.pdf",
            "type": "File"
        },
        {
            "name": "test2.xlsx",
            "type": "File"
        },
        {
            "name": "word.csv",
            "type": "File"
        }
}

Stored Procedure:

@activity('Get Metadata1').output.childitems

在此处输入图片说明

About how to create the stored procedure(get data from json object), you could ref this blog: Retrieve JSON Data from SQL Server using a Stored Procedure .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM