简体   繁体   English

通过 Azure 数据工厂,从本地数据湖到 azure 数据湖存储的数据加载速度非常慢

[英]Data Loading very slow from on-prem Data Lake to azure Data Lake Storage though Azure Data Factory

I want to load data from on-prem (Data Lake) storage to azure Data Lake storage gen2.我想将数据从本地(Data Lake)存储加载到 azure Data Lake 存储 gen2。

For this, I have created on-prem windows server and installed self hosted Integration Run-time on it.And connected to on-prem Data Lake(HIVE) from Azure Data Factory.为此,我创建了本地 windows 服务器并在其上安装了自托管集成运行时。并从 Azure 数据工厂连接到本地数据湖 (HIVE)。

In Azure Data Factory I have created a pipeline with copy activity and provided source as my on-prem Data Lake(Hive).And given SQL query to pull data.Likewise I need to add multiple copy activities for multiple tables.在 Azure 数据工厂中,我创建了一个具有复制活动的管道,并提供了源作为我的本地数据湖 (Hive)。并给出了 SQL 查询来提取数据。同样,我需要为多个表添加多个复制活动。

I have tried with single copy activity only in my pipeline.我只在我的管道中尝试过单一副本活动。

Here comes my problem:My pipeline is taking so much of time to load data into Data Lake.我的问题来了:我的管道花费了大量时间将数据加载到数据湖中。

My windows server in which my Integration Run-time is located has Bandwidth of 10Gbps.But it still loads very slow.我的 Integration Run-time 所在的 windows 服务器带宽为 10Gbps。但加载速度仍然很慢。

I have just tried to pull 20,000 records.And it took around 20 minutes to load data.我刚刚尝试拉取 20,000 条记录。加载数据大约需要 20 分钟。 The Throughput i was getting is around 15kbps which is very low.我获得的吞吐量约为 15kbps,非常低。

How can I improve the performance of my activity so that it will be faster.我怎样才能提高我的活动的表现,以便它会更快。

Can you check the configuration of Integration Runtime?您可以检查 Integration Runtime 的配置吗? How much RAM or nodes you have configured?您配置了多少 RAM 或节点?

Also, are you using Express Route or Side by Side VPN, Express Route is a faster option另外,您使用的是 Express Route 还是 Side by Side VPN,Express Route 是更快的选择

The recommended minimum configuration for the self-hosted integration runtime machine is a 2-GHz processor with 4 cores, 8 GB of RAM, and 80 GB of available hard drive space.自托管集成运行时机器的推荐最低配置是 2 GHz 处理器,具有 4 个内核、8 GB RAM 和 80 GB 可用硬盘空间。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 azure 数据湖客户端与 azure blob 客户端有何不同 - how azure data lake client different from azure blob client 将数据从本地 sql 服务器复制到 Azure Data Lake Storage Gen2 中的增量格式 - copy data from on premise sql server to delta format in Azure Data Lake Storage Gen2 如何从Azure数据湖(存储账户)中读取R中的一个数据文件 - How to read a data file in R from Azure data lake (storage account) Azure 数据湖和导出 SQL 查询 pyspark - Azure Data Lake and export SQL query with pyspark 从存储帐户(Azure Data lake)读取 pdf 文件,无需使用 python 下载 - Read pdf file from storage account (Azure Data lake) without downloading it using python 如何使用 Airflow 将 CSV 文件从 Azure Data Lake/Blob 存储传输到 PostgreSQL 数据库 - How to transfer a CSV file from Azure Data Lake/Blob Storage to PostgreSQL database with Airflow 如何使用 dbt 将镶木地板文件从 Azure Data Lake Gen2/Azure Blob 存储加载到专用池? - How to load parquet files from Azure Data Lake Gen2/Azure Blob Storage to Dedicated pool using dbt? 用于解析 Azure Data Lake Storage Gen2 URI 的正则表达式,用于使用 Azurite 进行生产和测试 - Regex to parse Azure Data Lake Storage Gen2 URI for production and testing with Azurite Azure Data Lake Gen2 存储帐户 blob 与 adf 选择 - Azure Data Lake Gen2 Storage Account blob vs adf choice 在 Azure 上的数据湖中的增量表中创建了多少个版本 - How many versions are created in a delta table in a Data lake on Azure
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM