简体   繁体   English

Pentaho将uniqe记录导入数据库

[英]Pentaho Import uniqe records into database

I am quite new to Pentaho Spoon and I would like to import records of an csv file to an database table. 我对Pentaho Spoon还是很陌生,我想将一个csv文件的记录导入数据库表。 However, only unique records should be imported into the database table. 但是,仅唯一记录应导入数据库表中。 That is why I need to compare EACH record with all records of the database table in order to determine if the record should be imported or not. 这就是为什么我需要将EACH记录与数据库表的所有记录进行比较,以确定是否应该导入记录。

So far, I tried out the suggested CRUD-pattern which looks like this: 到目前为止,我尝试了建议的CRUD模式,如下所示: 在此处输入图片说明

As you can see in the picture, I merge the excel input and the table input (ignore the cast-steps. I needed to cast a value because ther differed in the float format: database format was #.000000 and the csv format of float was #.0) 如您在图片中看到的,我合并了excel输入和表输入(忽略转换步骤。我需要转换一个值,因为它们在float格式方面有所不同:数据库格式为#.000000,而float的csv格式是#.0)

After the merge join, I compare the flag (which is given by the merge rows(diff) and if the compared records are new, I import them to the database table, if they are changed, I update the record and if they are deleted or identical, I simply do nothing. So far, so good. 合并联接后,我比较标志(由合并行(diff)给出),如果比较的记录是新记录,则将它们导入数据库表,如果它们被更改,我将更新记录并删除它们或相同,我只是什么都不做,到目前为止,很好。

But here is the problem: If I shuffle the records of the csv-input-file and run the transformation anew, all the records are imported anew and consequently, there are duplicated in my database table (which I wanted to avoid). 但这是问题所在:如果我重新整理csv-input-file的记录并重新运行转换,则所有记录都将重新导入,因此,数据库表中有重复的记录(我想避免)。 To emphasize again: The right way to solve this is that each row of the csv-input-file is compared with ALL entries in the database table. 再次强调:解决此问题的正确方法是将csv-input-file的每一行与数据库表中的ALL条目进行比较。

How can I realize this? 我怎么能意识到这一点? Any suggestions? 有什么建议么? Thank you so much in advance!! 提前非常感谢您!!

The Merge Rows (diff) expect the input to be sorted. Merge Rows (diff)期望对输入进行排序。 Normally, you have been warned of this by a pop-up. 通常,会通过弹出窗口警告您。

Put a Sort rows step on the output flow of the Excel Input, before it reaches the Merge Rows (diff) . 在到达“ Merge Rows (diff)之前,在“ Excel输入”的输出流上放置一个“ Sort rows步骤。

You should do the same between the Table Input and the Merge Rows (diff) . 您应该在Table InputMerge Rows (diff)之间执行相同的操作。 On course you may think you could do it in the sql statement of the Table Input . 当然,您可能会认为您可以在Table Input的sql语句中完成此操作。

However, there is a beginner trap here. 但是,这里有一个初学者陷阱。 You have 3 other steps Output Rows , Update and Delete which operates on the same table. 您还有其他3个步骤,在同一表上执行Output RowsUpdateDelete And these steps may lock the table. 这些步骤可能会锁定表格。 As in Kettle all the steps are running concurrently, you do not know which steps will fire first, and the table may be locked and never be able to read even the first record. 就像在Kettle中一样,所有步骤都同时运行,因此您不知道首先执行哪些步骤,并且该表可能被锁定,甚至无法读取第一条记录。 This is known in jargon as an auto-lock , and the way to solve it is to put a Sort Row step as a buffer . 用专业术语将其称为自动锁定 ,解决方法是将“ Sort Row步骤作为缓冲区

You can use the 'Dimension lookup/update' control which provides the same functionality which you are trying to achieve. 您可以使用“维度查找/更新”控件,该控件提供您尝试实现的相同功能。

Thanks, Nilesh 谢谢,尼罗什

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Pentaho 勺子复制到 MySQL 目标数据库后,从 MySQL 源数据库表中删除记录 - Delete records from MySQL source database table ,after copying to MySQL target database using Pentaho spoon Phpmyadmin导入-数据库大小和记录数是否更改? - Phpmyadmin import - change in database size and number of records? 将特定记录从一个数据库导入到另一个数据库 - Import specific records from one database to another 将记录从在线MySQL数据库导入MS Access数据库 - Import records from online MySQL database into MS Access database 如何将超过100000条记录导入到mysql数据库中? - How to import more than 100000 records into a mysql database? 将 Pentaho 连接到 mysql 数据库(本地主机) - Connecting Pentaho to mysql database (localhost) 用于Joomla的WPMU从Joomla 1.5导入数据库记录 - WPMU for Joomla import database records from Joomla 1.5 将记录从MySQL数据库导入MS SQL的最佳解决方案(每小时) - Best solution to import records from MySQL database to MS SQL (Hourly) 相同的 MySQL 数据库导入显示不同数量的记录 - Same MySQL database import showing different numbers of records 多步骤数据库查找水壶/ Pentaho - Multi step Database lookup Kettle / Pentaho
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM