[英]How to find the latest file in folder using azure data factory v2 (adf)
I am trying to read the latest blob file(csv) using azure data factory v2. 我正在尝试使用azure数据工厂v2读取最新的blob文件(csv)。 The file name also contains date(YYYY-MM-DD mm:ss-abcd.csv). 文件名还包含日期(YYYY-MM-DD mm:ss-abcd.csv)。 I need to read the data from the latest file present and load into table storage. 我需要从最新的文件中读取数据并加载到表存储中。 Could you please help me with how to read the latest file using ADF 您能帮我如何使用ADF读取最新文件吗?
Hello Faiz Rahman and thank you for your question. 您好Faiz Rahman,谢谢您的提问。 The date format you chose has the useful feature of lexicographic sorting matching chronological sorting. 您选择的日期格式具有字典编排匹配和时间排序的有用功能。 This means, once you have a list of blobs, extracting the date and comparing is all that is needed. 这意味着,一旦有了斑点列表,就需要提取日期并进行比较。
If you have a very large list of blobs, this might not be practical. 如果您的斑点列表非常多,则可能不切实际。 In that case, whenever you write a new blob, keep track of it somewhere, say, "maxBlobName.txt", and have pipeline read that to get the name of the most recent file. 在这种情况下,每当您编写新的Blob时,都应在某个地方(例如“ maxBlobName.txt”)进行跟踪,并让管道读取该文件以获取最新文件的名称。
Here is some example code for comparing the date portion of your blob names. 这是一些示例代码,用于比较Blob名称的日期部分。 To adapt for your purposes, you will need to use GetMetadata activity to fetch the blob names, and some string functions to extract only the date portion of the name for comparison. 为了适应您的目的,您将需要使用GetMetadata活动来获取Blob名称,并使用一些字符串函数仅提取名称的日期部分以进行比较。
{
"name": "pipeline9",
"properties": {
"activities": [
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "init array",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "@variables('list')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "If Condition1",
"type": "IfCondition",
"typeProperties": {
"expression": {
"value": "@greater(item(),variables('max'))",
"type": "Expression"
},
"ifTrueActivities": [
{
"name": "write new max",
"type": "SetVariable",
"typeProperties": {
"variableName": "max",
"value": {
"value": "@item()",
"type": "Expression"
}
}
}
]
}
}
]
}
},
{
"name": "init array",
"type": "SetVariable",
"typeProperties": {
"variableName": "list",
"value": {
"value": "@split(pipeline().parameters.input,',')",
"type": "Expression"
}
}
}
],
"parameters": {
"input": {
"type": "string",
"defaultValue": "'2019-07-25','2018-06-13','2019'-06-24','2019-08-08','2019-06-23'"
}
},
"variables": {
"max": {
"type": "String",
"defaultValue": "0001-01-01"
},
"list": {
"type": "Array"
}
}
}
} }
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.