简体   繁体   English

Pentaho水壶循环会导致内存泄漏?

[英]Memory Leak with Pentaho Kettle Looping?

I have an ETL requirement like: 我有一个ETL要求,例如:

I need to fetch around 20000 records from a table and process each record separately.(The processing of each record involves a couple of steps like creating a table for each record and inserting some data into it). 我需要从一个表中获取大约20000条记录并分别处理每个记录(处理每个记录涉及几个步骤,例如为每个记录创建一个表并将一些数据插入其中)。 For prototype I implemented it with two Jobs(with corresponding transformations). 对于原型,我用两个Jobs(具有相应的转换)实现了它。 Rather than table I created a simple empty file. 我没有创建表,而是创建了一个简单的空文件。 But this simple case also doesn't seem to work smoothly. 但是这种简单的情况似乎也无法顺利进行。 (When I do create a table for each record the Kettle exits after 5000 reocrds) (当我为每条记录创建一个表时,水壶在5000次记录后退出)

Flow

When I run this the Kettle goes slow and then hangs after 2000-3000 files though processing is complete after a long time though Kettle seems to stop at some time. 当我运行此程序时,水壶会变慢,然后在2000-3000个文件后挂起,尽管经过很长时间后处理已完成,但水壶似乎有时会停止。 Is my design approach right?. 我的设计方法正确吗? When I replace the write to file with actual requirement like creating a new table(through sql script step) for each id and inserting data into it, the kettle exits after 5000 records. 当我用实际要求替换写入文件时,例如为每个id创建一个新表(通过sql脚本步骤)并将数据插入其中,水壶将在5000条记录后退出。 What do I need to do so that the flow works. 我需要做什么才能使流程正常工作。 increasing the Java memory(Xmx is already at 2gb)?. 增加Java内存(Xmx已经为2gb)? Is there any other configuration I can change? 我可以更改其他配置吗? Or is there any other way? 还是还有其他方法? Extra Time shouldn't be a constraint but the flow should work. 多余的时间不应成为限制,但流程应该可以进行。

My initial guess was since we are not storing any data the prototype atleast should work smoothly. 我最初的猜测是,因为我们不存储任何数据,原型至少应该顺利进行。 I am using Kettle 3.2. 我正在使用Kettle 3.2。

I seem to remember this is a known issue/restriction, hence why job looping is deprecated these days. 我似乎记得这是一个已知问题/限制,因此为什么现在不推荐使用作业循环。

Are you able to re-build the job using the transformation and/or job executor steps? 您是否可以使用转换和/或作业执行者步骤来重新构建作业? You can execute any number of rows via those stops. 您可以通过这些停靠点执行任意数量的行。

These steps have their own issues - namely you have to explicitly handle errors, but it's worth a try just to see if you can achieve what you want. 这些步骤都有其自身的问题-也就是说,您必须显式处理错误,但是值得一试,看看是否可以实现所需的目标。 It's a slightly different mindset, but a nicer way to build loops than the job approach. 这是稍微不同的心态,但是比工作方法更好的构建循环的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM