简体   繁体   中英

How to validate 100's of columns in Azure DataFactory data flow

I have a data flow which needs to validate 200+ columns in Azure Data Factory. My source file is Excel and I am using 'Assert' to validate the columns. When I gave 5 columns to validate in assert, I am able to preview data in debug mode. But when I used all the 200+ columns validations, it just shows 'Fetching Data' and then timeouts. Can someone please help how to achieve this quick. I have published the flow and tried to execute it on 16 cores Integration runtime. Still it is of no use. I terminated the flow after waiting for 30 minutes. The size of my data set is quite small around 20 rows.

I found this official MS doc documenting the similar scenario here , see if this helps.

In excel source dataset, use range (for example, A1:G100) + firstRowAsHeader=false, and then it can load data from all Excel files even though the column name and count is different.

Timeout or slow performance when parsing large Excel file

  • Symptoms :

    • When you create Excel dataset and import schema from connection/store, preview data, list, or refresh worksheets, you may hit timeout error if the excel file is large in size.

    • When you use copy activity to copy data from large Excel file (>= 100 MB) into other data store, you may experience slow performance or OOM issue.

  • Cause :

    • For operations like importing schema, previewing data, and listing worksheets on excel dataset, the timeout is 100 s and static. For large Excel file, these operations may not finish within the timeout value.

    • The copy activity reads the whole Excel file into memory then locate the specified worksheet and cells to read data. This behavior is due to the underlying SDK the service uses.

  • Resolution :

    • For importing schema, you can generate a smaller sample file, which is a subset of original file, and choose "import schema from sample file" instead of "import schema from connection/store".

    • For listing worksheet, in the worksheet dropdown, you can click "Edit" and input the sheet name/index instead.

    • To copy large excel file (>100 MB) into other store, you can use Data Flow Excel source which sport streaming read and perform better.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM