简体   繁体   中英

Data Loading very slow from on-prem Data Lake to azure Data Lake Storage though Azure Data Factory

I want to load data from on-prem (Data Lake) storage to azure Data Lake storage gen2.

For this, I have created on-prem windows server and installed self hosted Integration Run-time on it.And connected to on-prem Data Lake(HIVE) from Azure Data Factory.

In Azure Data Factory I have created a pipeline with copy activity and provided source as my on-prem Data Lake(Hive).And given SQL query to pull data.Likewise I need to add multiple copy activities for multiple tables.

I have tried with single copy activity only in my pipeline.

Here comes my problem:My pipeline is taking so much of time to load data into Data Lake.

My windows server in which my Integration Run-time is located has Bandwidth of 10Gbps.But it still loads very slow.

I have just tried to pull 20,000 records.And it took around 20 minutes to load data. The Throughput i was getting is around 15kbps which is very low.

How can I improve the performance of my activity so that it will be faster.

Can you check the configuration of Integration Runtime? How much RAM or nodes you have configured?

Also, are you using Express Route or Side by Side VPN, Express Route is a faster option

The recommended minimum configuration for the self-hosted integration runtime machine is a 2-GHz processor with 4 cores, 8 GB of RAM, and 80 GB of available hard drive space.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM