简体   繁体   English

Pentaho Spoon - 等待文件 - 通配符

[英]Pentaho Spoon - Wait for File - Wildcards

I know i've asked a couple of pentaho related questions lately but am rushing to evaluate it in a short timeframe:)我知道我最近问了几个与 pentaho 相关的问题,但我急于在短时间内评估它:)

My latest obstacle I am trying to overcome is that I am building a job that will process an input file when it arrives, but i only know the format for the filename, not the exact filename itself and the "wait for file" step does not allow wildcards.我试图克服的最新障碍是我正在构建一个工作,它将在输入文件到达时对其进行处理,但我只知道文件名的格式,而不是确切的文件名本身,并且“等待文件”步骤不知道允许通配符。 This seems like a glaring ommision for such a step so am wondering if i've just missed something but on forums etc it seems i'm not the only one facing such a challenge.对于这样一个步骤,这似乎是一个明显的 ommision,所以我想知道我是否错过了一些东西,但在论坛等上似乎我不是唯一面临这种挑战的人。

Ideally i need the "wait for file" step to search on a wildcard/regex and when it finds a match pass the resulting files name to the next step in the job for processing.理想情况下,我需要“等待文件”步骤来搜索通配符/正则表达式,当找到匹配项时,将生成的文件名传递给作业的下一步进行处理。

Any suggestions?有什么建议么?

Thanks谢谢

Tom汤姆

Again I try to answer your question.我再次尝试回答你的问题。

Actually, you don't need a job to wait for a file.实际上,您不需要工作来等待文件。 Based on my answer on country split: Pentaho Spoon - Output to multiple files based on field content , you just need to pass through the source name and then archive it using process file (see the pic below).根据我对国家/地区拆分的回答: Pentaho Spoon - Output 到基于字段内容的多个文件,您只需要传递源名称,然后使用流程文件将其存档(参见下图)。文本输入对话框

From here, I think you can adapting my logic using the ktr I provided before (http://pentaho.phi-integration.com/kettle/kettle-files/split_countries.ktr ).从这里开始,我认为您可以使用我之前提供的 ktr (http://pentaho.phi-integration.com/kettle/kettle-files/split_countries.ktr ) 调整我的逻辑。

Then you can control the repetition of the job (wait and process files) using job scheduler (see the pic).然后您可以使用作业调度程序控制作业的重复(等待和处理文件)(参见图片)。在此处输入图像描述

Well, hope this helps Tom !好吧,希望这对汤姆有帮助!

Regards,问候,

Dino迪诺

I had a similar requirement, and solved this by creating a directory specifically for receiving the files (from a remote host).我有类似的要求,并通过创建一个专门用于接收文件(来自远程主机)的目录来解决这个问题。

The the "Get File Names" step reads the files in the directory and passes the name to the next step. “获取文件名”步骤读取目录中的文件并将名称传递给下一步。 The "Get File Names" allows wildcards, btw.顺便说一句,“获取文件名”允许使用通配符。

(Off course, I have to clean up in input queue once I have finished processing the file.) (当然,处理完文件后,我必须清理输入队列。)

EDIT: I omitted to mention that you loose the "wake" functionality with the Get File Names, and you'll have to loop and schedule regular parses of the directory.编辑:我没有提到你失去了获取文件名的“唤醒”功能,你必须循环和安排目录的定期解析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM