简体繁体 English

Informatica BDE 摄取作业运行 10 多个小时，当终止并重新运行时在 3 小时内完成

[英]Informatica BDE ingestion job runs for 10+ hours and when killed and rerun completes in 3 hrs

原文 2018-12-05 04:09:05 6 1 performance/ hadoop/ hive/ informatica/ trouble-tickets

About my profile - I am doing L3 support for some of the BDE Informatica ingestion jobs that run on our cluster.关于我的个人资料 -我正在为我们集群上运行的一些 BDE Informatica 摄取作业提供 L3 支持。 Our goal is help application teams meet the SLA.我们的目标是帮助应用程序团队满足 SLA。 We support job streams that run on top of Hadoop layer (Hive).我们支持在 Hadoop 层 (Hive) 之上运行的作业流。

Problem Statement - We have observed that on some days BDE Informatica ingestion jobs run painfully slow and on the other days they complete their cycle in 3 hours.问题陈述 -我们观察到，有时 BDE Informatica 摄取作业运行得非常缓慢，而有时它们会在 3 小时内完成其周期。 if the job is taking so much time, we usually kill and rerun which helps us, but that does not help us fix the root cause.如果工作花费了太多时间，我们通常会杀死并重新运行，这对我们有帮助，但这并不能帮助我们解决根本原因。

Limitations of our profile - Unfortunately, I don't have the application code or the Informatica tool but I have to connect to the development team and ask relevant questions so that we can narrow down the root cause.我们个人资料的局限性 -不幸的是，我没有应用程序代码或 Informatica 工具，但我必须联系开发团队并提出相关问题，以便我们可以缩小根本原因。

Next Steps -下一步 -

What sort of scenarios can cause this delay?什么样的场景会导致这种延迟？
What tools can I use to check what may be cause of the delay?我可以使用哪些工具来检查可能导致延迟的原因？
Few possible questions which I may ask the development team are -我可能会问开发团队的几个可能的问题是 -
1. are the tables analysed properly before running the job stream?在运行作业流之前是否正确分析了表？
2. is there any significant change in volume of data (this is bit unlikely as the job runs quickly on rerun)?数据量是否有任何显着变化（这不太可能，因为作业在重新运行时运行得很快）？

I am aware this is a very broad question and is requesting for help in approach rather than any attending a specific problem, but this is just a start to help fix this issue for good or approaching it in rational manner.我知道这是一个非常广泛的问题，并且是在寻求帮助而不是解决特定问题，但这只是帮助永久解决此问题或以理性方式解决问题的开始。

1 个解决方案

You need to check the Informatica logs to see if it's hanging at the same step each time.您需要检查 Informatica 日志以查看它是否每次都挂在同一步骤上。

Assuming its not, are you triggering the jobs at the same time each day... say Midnight and it usually completes by 3am... but sometimes it runs till 10am, where you kill and restart?假设它不是，您是否每天都在同一时间触发作业……比如说午夜，它通常在凌晨 3 点之前完成……但有时它会运行到上午 10 点，在那里您杀死并重新启动？

If so, I suggest you baseline the storage medium activity, under minimal load, during a 3 hrs quick run and during the 10 hour load.如果是这样，我建议您在最小负载下、在 3 小时快速运行期间和在 10 小时负载期间设定存储介质活动的基线。 Is there a difference in demand?需求有区别吗？

It sounds like a contention but that is causing a conflict.这听起来像是争用，但这会导致冲突。 A process maybe waits forever instead of resuming when the desired resource is available.当所需资源可用时，进程可能会永远等待而不是恢复。 Speak to the DBAs.与 DBA 交谈。