简体   繁体   中英

Make a DB INSERT based on Text File Input metadata

I'm developing an ETL and must do some routines for monitoring it.

At the begining, I must make in INSERT on DB to create a record informing the filename and starting process datetime. This query will return the record's PK and it must be stored. When the ETL of that file finishes, I must update that record informing the ETL finished with success and its ending process datetime.

I use Text File Input to look for files that match its regex, and add its "Additional output fields" to stream. But I can't find a component that will run only for first record and will execute a SQL command for the INSERT.

You can use "Identify last row" and "Filter rows" together, so you will keep only one line from your input (filtering just the last one). You INSERT will be right after the Filter Rows step.

在此处输入图片说明

As you will need to split your flow, you'll need to join your ID column with the original text input rows.

You also have a Unique row . If you do not specify on which field to filter a unique value, it will output one and exactly one row.

Now, unless I misunderstood your specs, I'd rather use Kettle's logging system . Click anywhere, select properties on the popup, then Logging tab. It will give you the status (Started/End/Stop/...) and plenty of additional info, like the number of errors, the line read and written (just tell the PDI on which step it has to look for these numbers).

You can even read almost real-time in the DB the same information as you see on the bottom panel of the PDI. Just click the fields you want and press the SQL button to create the file.

Just note that, for historical reasons, the start date is not really the start dte (it's the date of the previous successful run). The start date is called Replay date .

And also if you need this system to monitor the load and know if the run has to start or nor not, take care that on abrupt ending the system does sometimes not have the time to write "End" to the log. Therefore a logdate<now-10minutes is more reliable.

在此处输入图片说明

要仅对流的第一行执行某项操作,请使用“添加序列”步骤(从1开始),然后执行条件为“ seq = 1”的“过滤行”步骤。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM