简体   繁体   English

Pentaho 水壶:如何为转换/作业设置测试?

[英]Pentaho kettle: how to set up tests for transformations/jobs?

I've been using Pentaho Kettle for quite a while and previously the transformations and jobs i've made (using spoon) have been quite simple load from db, rename etc, input to stuff to another db.我已经使用 Pentaho Kettle 有一段时间了,以前我所做的转换和工作(使用勺子)非常简单,从数据库加载,重命名等,输入到另一个数据库的东西。 But now i've been doing transformations that do a bit more complex calculations that i would now like to test somehow.但现在我一直在做一些更复杂的计算的转换,我现在想以某种方式进行测试。

So what i would like to do is:所以我想做的是:

  1. Setup some test data设置一些测试数据
  2. Run the transformation运行转换
  3. Verify result data验证结果数据

One option would probably be to make a Kettle test job that would test the transformation.一种选择可能是做一个 Kettle 测试工作来测试转换。 But as my transformations relate to a java project i would prefer to run the tests from jUnit.但由于我的转换与 java 项目相关,我更愿意从 jUnit 运行测试。 So i've considered making a jUnit test that would:所以我考虑过做一个 jUnit 测试:

  1. Setup test data (using dbunit)设置测试数据(使用 dbunit)
  2. Run the transformation (using kitchen.sh from command line)运行转换(从命令行使用 kitchen.sh)
  3. Verify result data (using dbunit)验证结果数据(使用 dbunit)

This approach would however require test database(s) which are not always available (oracle etc. expensive/legacy db's) What I would prefer is that if I could mock or pass some stub test data to my input steps some how.然而,这种方法需要测试数据库,这些数据库并不总是可用(oracle 等。昂贵的/遗留数据库)我更喜欢的是,如果我可以模拟或将一些存根测试数据传递给我的输入步骤。

Any other ideas on how to test Pentaho kettle transformations?关于如何测试 Pentaho 水壶转换的任何其他想法?

there is a jira somewhere on jira.pentaho.com ( i dont have it to hand ) that requests exactly this - but alas it is not yet implemented. jira.pentaho.com 上的某个地方有一个 jira(我手头没有它),它正好要求这个——但可惜它还没有实现。

So you do have the right solution in mind- I'd also add jenkins and an ant script to tie it all together.所以你确实有正确的解决方案 - 我还会添加 jenkins 和一个 ant 脚本来将它们联系在一起。 I've done a similar thing with report testing - I actually had a pentaho job load the data, then it executed the report, then it compared the output with known output and reported pass/failure.我在报告测试方面做了类似的事情 - 我实际上有一个 Pentaho 作业加载数据,然后它执行报告,然后将输出与已知输出进行比较并报告通过/失败。

If you separate out your kettle jobs into two phases:如果您将水壶作业分为两个阶段:

  • load data to stream加载数据到流
  • process and update data处理和更新数据

You can use copy rows to result at the end of your load data to stream step, and get rows from result to get rows at the start of your process step.您可以使用复制行在加载数据到流步骤结束时生成结果,并从结果中获取行以在流程步骤开始时获取行。

If you do this, then you can use any means to load data (kettle transform, dbunit called from ant script), and can mock up any database tables you want.如果这样做,那么您可以使用任何方式加载数据(kettle 转换,从 ant 脚本调用的 dbunit),并且可以模拟您想要的任何数据库表。

I use this for testing some ETL scripts I've written and it works just fine.我用它来测试我编写的一些 ETL 脚本,它工作得很好。

You can use the data validator step .您可以使用数据验证器步骤 Of course is not a full unit test suite, but i think sometimes will be useful to check the data integrity in a quick way.当然不是完整的单元测试套件,但我认为有时以快速方式检查数据完整性会很有用。 You can run more than several tests at once.您可以一次运行多个测试。

For a more "serious" test i will recommend the @codek answer and execute your kettles under Jenkins.对于更“严肃”的测试,我会推荐@codek 答案并在 Jenkins 下执行您的水壶。

数据验证器步骤截图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM