簡體   English   中英

Azure 自托管集成運行時加載 ORC 文件時數據工廠管道失敗:OutOfMemory 異常,堆大小

[英]Azure Data Factory pipeline fails when Self-Hosted Integration runtime loads ORC file: OutOfMemory Exception, Heap size

我目前在嘗試從 Azure 數據工廠加載 ORC 文件時遇到問題。 當文件太大時,ADF 管道會抱怨我們的自托管集成運行時失敗並出現 OutOfMemory 異常,因為 Java 最大堆大小太小而無法完成加載。

已經嘗試過不同的解決方案,例如通過環境變量甚至注冊表中的鍵來增加堆大小(有點像 hack)。 具有自托管集成運行時的 VM 具有超過 100GB 的 RAM。

但是仍然失敗,因為當從 ADF 查詢集成運行時時,這些值似乎一直被“默認”值覆蓋。 有任何想法嗎?

'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'
Job ID: daee1a1d-b880-ecb2-e56c-a59397547668
Log ID: Warning        
TraceComponentId: TransferClientLibrary
TraceMessageId: TasksCoordinatorFatalErrorCallback
@logId: Warning
jobId: daee1a1d-b880-ecb2-e56c-a59397547668
activityId: c643b611-8356-4f49-b6d6-e87ea50670e5
eventId: TasksCoordinatorFatalErrorCallback
message: 'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.nio.BufferOverflowException:Unable to retrieve Java exception..,Source=Microsoft.DataTransfer.Richfile.OrcTransferPlugin,StackTrace= at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext()
at Microsoft.DataTransfer.Common.Shared.DeserializeControllerBase.GetEstimatedRowSize()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializeController..ctor(DataTable targetSchema, IEnumerable`1 streams, OrcFormatSetting settings, IErrorRowOutput errorRowOutput)
at Microsoft.DataTransfer.ClientLibrary.OrcSerializer.Deserialize(TransferStream stream)
at Microsoft.DataTransfer.Runtime.DeserializationStageProcessor.<Deserialize>d__14.MoveNext()
at Microsoft.DataTransfer.Runtime.TypeConversionStageProcessor.<CreateDataReader>d__5.MoveNext()
at Microsoft.DataTransfer.Runtime.SerializationStageProcessor.<Serialize>d__11.MoveNext()
at Microsoft.DataTransfer.Runtime.BinarySinkStageProcessor.<PopulateStreamName>d__10.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.MultipartWriteSink.ConsumeStreams(IEnumerable`1 streams),''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,StackTrace= at Microsoft.DataTransfer.Richfile.Bridge.BaseObjectBridge.CallObject[TEnum](TEnum methodEnum, jValue[] args)
at Microsoft.DataTransfer.Richfile.Bridge.Orc.OrcBatchReaderBridge.MoveNext()
at Microsoft.DataTransfer.ClientLibrary.OrcDeserializer.<GetRows>d__42.MoveNext(),'

微軟在他們自己這邊發現了一個錯誤。 加載 .orc 文件時,如果 .orc 文件包含charvarchar列類型,則可能會發生這種錯誤。 將它們全部轉換為字符串類型修復了這個錯誤。 它已被 Microsoft 認可,它位於 Azure 數據工廠端,從現在起大約需要 6 個月才能修復。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM