简体繁体 English

Pentaho水壶循环会导致内存泄漏？

[英]Memory Leak with Pentaho Kettle Looping?

原文 2016-09-02 06:45:41 1 1 java/ pentaho/ kettle/ geokettle

I have an ETL requirement like: 我有一个ETL要求，例如：

I need to fetch around 20000 records from a table and process each record separately.(The processing of each record involves a couple of steps like creating a table for each record and inserting some data into it). 我需要从一个表中获取大约20000条记录并分别处理每个记录（处理每个记录涉及几个步骤，例如为每个记录创建一个表并将一些数据插入其中）。 For prototype I implemented it with two Jobs(with corresponding transformations). 对于原型，我用两个Jobs（具有相应的转换）实现了它。 Rather than table I created a simple empty file. 我没有创建表，而是创建了一个简单的空文件。 But this simple case also doesn't seem to work smoothly. 但是这种简单的情况似乎也无法顺利进行。 (When I do create a table for each record the Kettle exits after 5000 reocrds) （当我为每条记录创建一个表时，水壶在5000次记录后退出）

Flow 流

When I run this the Kettle goes slow and then hangs after 2000-3000 files though processing is complete after a long time though Kettle seems to stop at some time. 当我运行此程序时，水壶会变慢，然后在2000-3000个文件后挂起，尽管经过很长时间后处理已完成，但水壶似乎有时会停止。 Is my design approach right?. 我的设计方法正确吗？ When I replace the write to file with actual requirement like creating a new table(through sql script step) for each id and inserting data into it, the kettle exits after 5000 records. 当我用实际要求替换写入文件时，例如为每个id创建一个新表（通过sql脚本步骤）并将数据插入其中，水壶将在5000条记录后退出。 What do I need to do so that the flow works. 我需要做什么才能使流程正常工作。 increasing the Java memory(Xmx is already at 2gb)?. 增加Java内存（Xmx已经为2gb）？ Is there any other configuration I can change? 我可以更改其他配置吗？ Or is there any other way? 还是还有其他方法？ Extra Time shouldn't be a constraint but the flow should work. 多余的时间不应成为限制，但流程应该可以进行。

My initial guess was since we are not storing any data the prototype atleast should work smoothly. 我最初的猜测是，因为我们不存储任何数据，原型至少应该顺利进行。 I am using Kettle 3.2. 我正在使用Kettle 3.2。

1 个解决方案

I seem to remember this is a known issue/restriction, hence why job looping is deprecated these days. 我似乎记得这是一个已知问题/限制，因此为什么现在不推荐使用作业循环。

Are you able to re-build the job using the transformation and/or job executor steps? 您是否可以使用转换和/或作业执行者步骤来重新构建作业？ You can execute any number of rows via those stops. 您可以通过这些停靠点执行任意数量的行。

These steps have their own issues - namely you have to explicitly handle errors, but it's worth a try just to see if you can achieve what you want. 这些步骤都有其自身的问题-也就是说，您必须显式处理错误，但是值得一试，看看是否可以实现所需的目标。 It's a slightly different mindset, but a nicer way to build loops than the job approach. 这是稍微不同的心态，但是比工作方法更好的构建循环的方法。