简体   繁体   English

如何使用Azure Data Factory V2(ADF)在文件夹中查找最新文件

[英]How to find the latest file in folder using azure data factory v2 (adf)

I am trying to read the latest blob file(csv) using azure data factory v2. 我正在尝试使用azure数据工厂v2读取最新的blob文件(csv)。 The file name also contains date(YYYY-MM-DD mm:ss-abcd.csv). 文件名还包含日期(YYYY-MM-DD mm:ss-abcd.csv)。 I need to read the data from the latest file present and load into table storage. 我需要从最新的文件中读取数据并加载到表存储中。 Could you please help me with how to read the latest file using ADF 您能帮我如何使用ADF读取最新文件吗?

Hello Faiz Rahman and thank you for your question. 您好Faiz Rahman,谢谢您的提问。 The date format you chose has the useful feature of lexicographic sorting matching chronological sorting. 您选择的日期格式具有字典编排匹配和时间排序的有用功能。 This means, once you have a list of blobs, extracting the date and comparing is all that is needed. 这意味着,一旦有了斑点列表,就需要提取日期并进行比较。

If you have a very large list of blobs, this might not be practical. 如果您的斑点列表非常多,则可能不切实际。 In that case, whenever you write a new blob, keep track of it somewhere, say, "maxBlobName.txt", and have pipeline read that to get the name of the most recent file. 在这种情况下,每当您编写新的Blob时,都应在某个地方(例如“ maxBlobName.txt”)进行跟踪,并让管道读取该文件以获取最新文件的名称。

Here is some example code for comparing the date portion of your blob names. 这是一些示例代码,用于比较Blob名称的日期部分。 To adapt for your purposes, you will need to use GetMetadata activity to fetch the blob names, and some string functions to extract only the date portion of the name for comparison. 为了适应您的目的,您将需要使用GetMetadata活动来获取Blob名称,并使用一些字符串函数仅提取名称的日期部分以进行比较。

{
"name": "pipeline9",
"properties": {
    "activities": [
        {
            "name": "ForEach1",
            "type": "ForEach",
            "dependsOn": [
                {
                    "activity": "init array",
                    "dependencyConditions": [
                        "Succeeded"
                    ]
                }
            ],
            "typeProperties": {
                "items": {
                    "value": "@variables('list')",
                    "type": "Expression"
                },
                "isSequential": true,
                "activities": [
                    {
                        "name": "If Condition1",
                        "type": "IfCondition",
                        "typeProperties": {
                            "expression": {
                                "value": "@greater(item(),variables('max'))",
                                "type": "Expression"
                            },
                            "ifTrueActivities": [
                                {
                                    "name": "write new max",
                                    "type": "SetVariable",
                                    "typeProperties": {
                                        "variableName": "max",
                                        "value": {
                                            "value": "@item()",
                                            "type": "Expression"
                                        }
                                    }
                                }
                            ]
                        }
                    }
                ]
            }
        },
        {
            "name": "init array",
            "type": "SetVariable",
            "typeProperties": {
                "variableName": "list",
                "value": {
                    "value": "@split(pipeline().parameters.input,',')",
                    "type": "Expression"
                }
            }
        }
    ],
    "parameters": {
        "input": {
            "type": "string",
            "defaultValue": "'2019-07-25','2018-06-13','2019'-06-24','2019-08-08','2019-06-23'"
        }
    },
    "variables": {
        "max": {
            "type": "String",
            "defaultValue": "0001-01-01"
        },
        "list": {
            "type": "Array"
        }
    }
}

} }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Azure Monitor 或 ADF 本身触发 Azure 数据工厂 V2 中长时间运行的进程的警报通知? - How to trigger an alert notification of a long-running process in Azure Data Factory V2 using either Azure Monitor or ADF itself? 添加动态内容-Azure数据工厂ADF V2 - Add Dynamic Content - Azure Data Factory ADF V2 使用 Azure 数据工厂 v2 的 ETL - ETL using Azure Data Factory v2 如何在复制活动( NoSQL 到 SQL )中过滤 ADF(Azure 数据工厂)V2 中最后一次成功触发器运行的数据? - How to filter data on last successful trigger run in ADF(Azure Data Factory) V2 in copy activity ( NoSQL to SQL )? Azure Data Factory V2-将管道发布到指定的文件夹 - Azure Data Factory V2 - publish pipeline to specified folder 循环浏览文件夹目录中的每个文件并检查日期Azure Data Factory V2-错误的代码 - Loop over each file in folder directory and check date Azure Data Factory V2 -wrong code 使用Azure ADF V2运行.EXE - Run .EXE in using Azure ADF V2 如何重命名 ADF(Azure 数据工厂)管道中的文件夹 - How to rename a folder in ADF(Azure data factory) pipeline 使用Azure Data Factory V2将本地平面文件复制到Azure Blob - Copy On-Prem Flat file to Azure Blob using Azure Data Factory V2 使用Azure数据工厂(ADF)仅从Azure Data Lake存储中复制最新文件 - Copy only the latest file from azure data lake store with Azure Data Factory (ADF)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM