简体   繁体   English

Azure 数据工厂 - Oracle 数据源糟糕的性能

[英]Azure Data Factory - Oracle Source Terrible Performance

Working within Azure Data Factory, using the built-in Oracle Connector...在 Azure 数据工厂中工作,使用内置的 Oracle 连接器...

Given a very simple query, such as:给出一个非常简单的查询,例如:

SELECT Col001, Col002, Col003 FROM APPS.WHATEVER_TABLE;

This type of query, with around 30 columns, can stream 1,000,000 rows to Toad on a tiny VM in less than 60 seconds.这种类型的查询大约有 30 列,可以在不到 60 秒的时间内将 1,000,000 行流式传输到微型虚拟机上的 Toad。 From the exact same Oracle server, from within Azure Data Factory's Self-Hosted Integration Runtime, this query takes over 8 minutes, with frequent pauses/hangs.从完全相同的 Oracle 服务器,从 Azure 数据工厂的自托管集成运行时,此查询需要超过 8 分钟,经常暂停/挂起。

The CPU in the IR box runs at around 30% during this time.在此期间,IR 盒中的 CPU 运行在 30% 左右。 The free memory on the IR box stays at or above 5GB during this time.红外盒上的空闲内存在此期间保持在 5GB 或以上。 This performs the same, regardless of the DTU level of the Azure SQL Database Sink.无论 Azure SQL 数据库接收器的 DTU 级别如何,这都会执行相同的操作。 Today I tried this between 800 DTU and 3,000 DTU and got the exact same performance, with Log I/O on the Azure SQL Database staying at or under 10%.今天,我在 800 DTU 和 3,000 DTU 之间尝试了这个,并获得了完全相同的性能,Azure SQL 数据库上的日志 I/O 保持在 10% 或以下。

The documentation for the ADF Oracle Connector does not help in this at all, as it does not give any guidance for how to tweak connection string parameters, or really whether or not you can even do so. ADF Oracle Connector 的文档在这方面根本没有帮助,因为它没有提供关于如何调整连接字符串参数的任何指导,或者您是否真的可以这样做。

Thoughts?想法?

Resolution:解析度:

We began to suspect that something was amiss with data types, because the problem disappeared if we cast all of our high-precision Oracle NUMBER columns to less precision, or to something like integer.我们开始怀疑数据类型有问题,因为如果我们将所有高精度 Oracle NUMBER 列转换为较低精度或整数之类的东西,问题就会消失。

It got so bad that we opened a case with Microsoft about it, and our worst fears were confirmed.事情变得如此糟糕,以至于我们就此事向 Microsoft 提起了诉讼,我们最担心的事情得到了证实。

The Azure Data Factory runtime decimal type has a maximum precision of 28. If a decimal/numeric value from the source has a higher precision, ADF will first cast it to a string. Azure 数据工厂运行时十进制类型的最大精度为 28。如果来自源的十进制/数字值具有更高的精度,ADF 将首先将其转换为字符串。 The performance of the string casting code is abysmal.字符串转换代码的性能非常糟糕。

Check to see if your source has any high-precision numeric data, or if you have not explicitly defined schema, see if you're perhaps accidentally using string.检查您的源是否有任何高精度数字数据,或者如果您没有明确定义架构,看看您是否可能不小心使用了字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM