简体   繁体   中英

Copying file contents from Azure Storage to Azure SQL Db using Azure Data Factory

First time poster, long time reader.

A third party provider is uploading CSV-files once a day to a shared Azure Blob Storage. The files have a certain prefix with a timestamp in the filename and reside in the same directory. Fi "dw_palkkatekijat_20170320T021" Every file will have all the data the previous had, plus the newly added data from the previous day. I would like to import all the rows from all the files to a SQL table in an Azure SQL DB. This I can do.

The problem I have is that I don't know how to add the filename into a separate column in the table, so I can separate which file the rows came from, and only use the newest rows. I need to import all the files' contents and store all "versions" of the files. Is there a way I can send the filename as a parameter for a SQL stored procedure? Or any alternate way to handle this problem?

Thank you for your help.

In the current situation you've described you won't be able to get the exact file name. ADF isn't a data transformation service so doesn't give you this level functionality... I wish it did!

However, there are a couple of options to get the file name or something similar to use. None of which I accept are perfect!

Option 1 (Best option, I think!)

As you asked. Pass a parameter to the SQL DB stored procedure. This is certainly possible using the ADF activity parameter attribute.

What to pass as a param?...

Well, if your source files in blob storage have a nicely defined date and time in the file name. Which is what you already use in the input dataset definition then pass that to the proc. Store it in SQL DB table. Then you can work out when the file was loaded and when for and the period of overlap. Maybe?

You can access the time slice start for the dataset in the activity. Example JSON...

    "activities": [
        {
            "name": "StoredProcedureActivityTemplate",
            "type": "SqlServerStoredProcedure",
            "inputs": [
                {
                    "name": "BlobFile"
                }
            ],
            "outputs": [
                {
                    "name": "RelationalTable"
                }
            ],
            "typeProperties": {
              "storedProcedureName": "[dbo].[usp_LoadMyBlobs]",
              "storedProcedureParameters": {
                  //like this:
                  "ExactParamName": "$$Text.Format('{0:yyyyMMdd}', Time.AddMinutes(SliceStart, 0))" //tweak the date format
              }
            }, //etc ....

Option 2 (Loads of effort)

Create yourself a middle man ADF custom activity that reads the file, plus the file name and adds the value as a column.

Custom activities in ADF basically give you the extensibility to do anything as you have to craft the data transformation behaviour in C#.

I would recommend learning what's involved in using custom activities if you want to go down this route. Lots more effort and an Azure Batch Service will be required.

Option 3 (Total overkill)

Use an Azure Data Lake Analytics service! Taking the same approach as option 2. Use USQL in data lake to parse the file and include the file name in the output dataset. In USQL you can pass a wildcard for the file name as part of the extractor and use it within the output dataset.

I brand this option as overkill because bolting on a complete data lake service just to read a filename is excessive. In reality data lake could probably replace your SQL DB layer and give you the file name transformation for free.

By the way, you won't need to use Azure Data Lake storage to store you source files. You could give the analytics service access to the existing shared blob storage account. But you would need it to support the analytics service, only.

Option 4

Have a rethink and use Azure Data Lake instead of Azure SQL DB?????

Hope this helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM