简体   繁体   English

在 Lookup @ Azure Data Factory 中处理 >5000 行

[英]Handle >5000 rows in Lookup @ Azure Data Factory

I have a Copy Activity which copies a Table from MySQL to Azure Table Storage.我有一个将表从 MySQL 复制到 Azure 表存储的复制活动。 This works great.这很好用。 But when I do a Lookup on the Azure Table I get an error.但是,当我在 Azure 表上执行查找时,出现错误。 (Too much Data) (数据太多)

This is as designed referred to the documentation: The Lookup activity has a maximum of 5,000 rows, and a maximum size of 2 MB.这是参考文档设计的:Lookup 活动最多有 5,000 行,最大大小为 2 MB。

Also there is a Workaround mentioned: Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.还提到了一个解决方法:设计一个两级管道,其中外部管道在内部管道上迭代,内部管道检索不超过最大行数或大小的数据。

How can I do this?我怎样才能做到这一点? Is there a way to define a offset (eg only read 1000 rows)有没有办法定义偏移量(例如只读取 1000 行)

蔚蓝数据工厂

Do you really need 5000 iterations of your foreach?您真的需要 foreach 的 5000 次迭代吗? What kind of process are you doing in the foreach, isn't there a more efficient way of doing that?你在foreach中做了什么样的过程,没有更有效的方法吗?

Otherwise, maybe the following solution might be possible.否则,也许可以使用以下解决方案。

Create a new pipeline with 2 integer variables: iterations and count with 0 as defaults.创建一个包含 2 个整数变量的新管道:默认值为 0 的迭代和计数。

First determine the needed number of iterations.首先确定所需的迭代次数。 Do a lookup to determine the total number of datasets.进行查找以确定数据集的总数。 In your query divide this by 5000, add one and round it upwards.在您的查询中,将其除以 5000,加一并向上取整。 Set the value of the iterations variable to this value using the set variable activity.使用设置变量活动将迭代变量的值设置为此值。

Next, add a while loop with expression something like @less(variables('count'),variables('iterations')).接下来,添加一个 while 循环,其表达式类似于 @less(variables('count'),variables('iterations'))。 in this while loop call your current pipeline and pass the count variable as a parameter.在此 while 循环中调用您当前的管道并将计数变量作为参数传递。 After the execute pipeline activity, set the count variable to +1.在执行管道活动之后,将计数变量设置为 +1。

In your current pipeline you can use the limit/offset clause in combination with the passed parameter in a MySQL query to get the first 0-5000 results for your first iteration, 5000-10000 for your second iteration etc..在您当前的管道中,您可以将限制/偏移子句与 MySQL 查询中传递的参数结合使用,以获得第一次迭代的前 0-5000 个结果,第二次迭代的 5000-10000 等。

If you really need to iterate on the table storage, the only solution I see is that you'll have to create pagination on the resultset yourself, you could use a logic app for this purpose and call it by using a webhook.如果您真的需要迭代表存储,我看到的唯一解决方案是您必须自己在结果集上创建分页,您可以为此目的使用逻辑应用程序并使用 webhook 调用它。

I have a Copy Activity which copies a Table from MySQL to Azure Table Storage.我有一个复制活动,该活动将一个表从MySQL复制到Azure表存储。 This works great.这很好用。 But when I do a Lookup on the Azure Table I get an error.但是,当我在Azure表上执行查找时,出现错误。 (Too much Data) (数据太多)

This is as designed referred to the documentation: The Lookup activity has a maximum of 5,000 rows, and a maximum size of 2 MB.这是根据文档说明进行设计的:查找活动最多具有5,000行,最大大小为2 MB。

Also there is a Workaround mentioned: Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size.还提到了一种解决方法:设计一个两级管道,其中外部管道在内部管道上进行迭代,该内部管道检索不超过最大行数或大小的数据。

How can I do this?我怎样才能做到这一点? Is there a way to define a offset (eg only read 1000 rows)有没有一种方法来定义偏移量(例如,仅读取1000行)

天蓝色数据工厂

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure 数据工厂中的缓存查找属性 - Cache Lookup Properties in Azure Data Factory Azure 数据工厂 -> 查找、ForEach 和复制活动 - Azure Data Factory -> Lookup , ForEach and copy activity 从 Azure 数据工厂 (ADF) 查找 Azure AD - Lookup Azure AD from Azure Data Factory (ADF) 处理 Azure 数据工厂管道中的空值 - Handle Null values in Azure Data factory Pipeline Azure数据工厂数据复制源中的查找活动-语法错误 - Lookup activity in an Azure Data Factory Data Copy Source - Syntax Error Azure 数据工厂查找源数据和邮件通知 - Azure Data Factory Lookup Source Data and Mail Notification 如何将 object 从 azure 数据工厂查找传递到 databricks 笔记本? - How to pass an object from an azure data factory lookup to a databricks notebook? Azure 数据工厂导入多值查找字段 - Azure Data Factory import multiple valued lookup fields 可以在azure数据工厂中执行模糊查找和模糊分组操作 - can fuzzy lookup and fuzzy grouping operations be performed in azure data factory Azure数据工厂查找活动,具有表类型参数的存储过程 - Azure Data Factory Lookup Activity, Stored Procedure with Table Type parameter
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM