简体   繁体   English

根据文本文件输入元数据进行数据库插入

[英]Make a DB INSERT based on Text File Input metadata

I'm developing an ETL and must do some routines for monitoring it. 我正在开发ETL,并且必须执行一些例程来对其进行监视。

At the begining, I must make in INSERT on DB to create a record informing the filename and starting process datetime. 开始时,我必须在DB上的INSERT中创建一条记录,以通知文件名和开始过程的日期时间。 This query will return the record's PK and it must be stored. 该查询将返回记录的PK,并且必须将其存储。 When the ETL of that file finishes, I must update that record informing the ETL finished with success and its ending process datetime. 当该文件的ETL完成时,我必须更新该记录,以告知ETL成功完成及其结束过程的日期时间。

I use Text File Input to look for files that match its regex, and add its "Additional output fields" to stream. 我使用文本文件输入来查找与其正则表达式匹配的文件,并将其“其他输出字段”添加到流中。 But I can't find a component that will run only for first record and will execute a SQL command for the INSERT. 但是我找不到一个仅可用于第一条记录并且将对INSERT执行SQL命令的组件。

You can use "Identify last row" and "Filter rows" together, so you will keep only one line from your input (filtering just the last one). 您可以同时使用“标识最后一行”和“过滤行”,因此您将仅保留输入中的一行(仅过滤最后一行)。 You INSERT will be right after the Filter Rows step. 您将在“过滤器行”步骤之后立即插入。

在此处输入图片说明

As you will need to split your flow, you'll need to join your ID column with the original text input rows. 由于需要拆分流程,因此需要将ID列与原始文本输入行连接在一起。

You also have a Unique row . 您还具有Unique row If you do not specify on which field to filter a unique value, it will output one and exactly one row. 如果您未指定在哪个字段上过滤唯一值,则它将只输出一行。

Now, unless I misunderstood your specs, I'd rather use Kettle's logging system . 现在,除非我误解了您的规格,否则我宁愿使用Kettle的日志记录系统 Click anywhere, select properties on the popup, then Logging tab. 单击任意位置,在弹出窗口中选择属性,然后单击“日志记录”选项卡。 It will give you the status (Started/End/Stop/...) and plenty of additional info, like the number of errors, the line read and written (just tell the PDI on which step it has to look for these numbers). 它将为您提供状态(开始/结束/停止/ ...)和大量其他信息,例如错误数,读取和写入的行(只需告诉PDI它必须在哪一步上寻找这些数字)。 。

You can even read almost real-time in the DB the same information as you see on the bottom panel of the PDI. 您甚至可以几乎实时地在数据库中读取与PDI底部面板上相同的信息。 Just click the fields you want and press the SQL button to create the file. 只需单击所需的字段,然后按SQL按钮即可创建文件。

Just note that, for historical reasons, the start date is not really the start dte (it's the date of the previous successful run). 请注意,由于历史原因,开始日期并不是真正的开始日期(它是前一次成功运行的日期)。 The start date is called Replay date . 开始日期称为“ Replay date

And also if you need this system to monitor the load and know if the run has to start or nor not, take care that on abrupt ending the system does sometimes not have the time to write "End" to the log. 另外,如果您需要该系统监视负载并知道运行是否必须开始,请当心,在系统突然终止时,有时没有时间将“ End”写入日志。 Therefore a logdate<now-10minutes is more reliable. 因此, logdate<now-10minutes更可靠。

在此处输入图片说明

要仅对流的第一行执行某项操作,请使用“添加序列”步骤(从1开始),然后执行条件为“ seq = 1”的“过滤行”步骤。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure / Cosmos DB正在计算我未发出的请求(RU) - Azure / Cosmos DB is counting requests (RUs) I didn't make IoTAgent 中的 FIWARE 元数据 - FIWARE Metadata in IoTAgent 如何确保基于堆栈驱动程序日志的指标获得 0 值而不是没有数据? - how to make sure stackdriver log based metrics gets 0 value instead of no data? 在FileSystemWatcher检测到更改后读取文本文件的最后更改 - Reading the last changes of a text file after FileSystemWatcher detects change 如何在bash /替换文本/ CPU使用中使函数在后台工作 - How to make a function work in background in bash / replace text / CPU usage Firebird DB - 监控表 - Firebird DB - monitoring table PowerShell 2.0在不事先知道名称的情况下写入从数组创建的文本文件 - PowerShell 2.0 Writing to a text file created from an array without prior knowledge of name 如何使用C ++监视文本文件更改? 难度:没有.NET - How do I Monitor Text File Changes with C++? Difficulty: No .NET 监控 AWS 中的数据库趋势 - Monitoring DB trends in AWS 在构建(或设置)基于RRD Tool的Web应用程序以进行网站监控方面比Cacti简单吗? - Make recommendations on building (or setting up) an RRD Tool based web app for website monitoring that is simpler than Cacti?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM