简体   繁体   English

使用Apache Nifi从第三方Rest APi收集文件-流程建议

[英]Using Apache Nifi to collect files from 3rd party Rest APi - Flow advice

I am trying to create a flow within Apache-Nifi to collect files from a 3rd party RESTful APi and I have set my flow with the following: 我正在尝试在Apache-Nifi中创建一个流程,以从第三方RESTful APi收集文件,并且我使用以下方法设置了流程:

InvokeHTTP - ExtractText - PutFile InvokeHTTP-ExtractText-PutFile

I can collect the file that I am after, as I have specified this within my Remote URL however when I get all of the data from said file it is outputting multiple (100's) of the same files to my output directory. 我可以收集我想要的文件,因为我已经在“远程URL”中指定了该文件,但是当我从所述文件中获取所有数据时,它会将多个(100个)相同的文件输出到我的输出目录中。

3 things I need help with: 我需要帮助的3件事:

1: How do I get the flow to output the file in a readable .csv rather than just a file with no ext 1:如何获取以可读的.csv输出文件的流程,而不只是没有扩展名的文件

2: How can I stop the processor once I have all of the data that I need 2:拥有所有需要的数据后,如何停止处理器

3: The Json file that I have been supplied with gives me the option to get files from a certain date range: 3:提供的Json文件使我可以选择从特定日期范围获取文件:

https://api.3rdParty.com/reports/v1/scheduledReports/877800/1553731200000

Or I can choose a specific file: 或者我可以选择一个特定的文件:

https://api.3rdParty.com/reports/v1/scheduledReports/download/877800/201904/CTDDaily/2019-04-02T01:50:00Z.csv

But how can I create a command in Nifi to automatically check for newer files, as this process will be running daily and we will be looking at downloading a new file each day. 但是,我如何在Nifi中创建一个命令来自动检查更新的文件,因为该过程每天运行,我们每天都在考虑下载新文件。

If this is too broad, please help me by letting me know so I can edit this post. 如果范围太广,请告诉我,以帮助我,以便我可以编辑此帖子。

Thanks. 谢谢。

Note: 3rdParty host name has been renamed to comply with security - therefore links will not directly work. 注意:3rdParty主机名已重命名以符合安全性-因此链接将无法直接使用。 Thanks. 谢谢。

1) You change the filename of the flow file to anything you want using the UpdateAttribute processor. 1)您可以使用UpdateAttribute处理器将流文件的文件名更改为所需的任何文件名。 If you want to make it have a ".csv" extension then you can add a property named "filename" with a value of "${filename}.csv" (without the quotes when you enter it). 如果要使其具有“ .csv”扩展名,则可以添加一个名为“ filename”的属性,其值为“ $ {filename} .csv”(输入时不带引号)。

2) By default most processors have a scheduling strategy of timer-driver 0 seconds, which means keep running as fast as possible. 2)默认情况下,大多数处理器的计时器驱动程序调度策略为0秒,这意味着保持尽可能快的运行速度。 Go to the configuration of the processor on the scheduling tab and configure the appropriate schedule, it sounds like you probably want CRON scheduling to schedule it daily. 在“调度”选项卡上转到处理器的配置并配置适当的调度,这听起来像您可能希望CRON调度每天进行调度。

3) You can use NiFi expression language statements to create dynamic time ranges. 3)您可以使用NiFi表达式语言语句来创建动态时间范围。 I don't fully understand the syntax for the API that you have to communicate with, but you could do something like this for the URL: 我不完全了解您必须与之通信的API的语法,但是您可以对URL执行以下操作:

https://api.3rdParty.com/reports/v1/scheduledReports/877800/ ${now()} https://api.3rdParty.com/reports/v1/scheduledReports/877800/ $ {now()}

Where now() would return the current timestamp as an epoch. now()将返回当前时间戳记为纪元的地方。

You can also format it to a date string if necessary: 如果需要,还可以将其格式化为日期字符串:

${now():format('yyyy-MM-dd')} $ {现在():格式( 'YYYY-MM-DD')}

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 适用于第三方客户(AAA)的REST API - rest api for 3rd party customers (AAA) 使用来自第三方api和geodjango的JSON响应 - Using the JSON response from 3rd party api with geodjango 从javascript / jQuery中的第三方API序列化数据 - Serialize the data from 3rd party API in javascript/jQuery 用PHP和JSON创建第三方API - creating 3rd party API with PHP and JSON 如何在Windows Phone中在没有任何第三方库的情况下使用具有OAuth身份验证的Twitter REST API 1.1 - How to use Twitter REST API 1.1 with OAuth Authentication without any 3rd party libraries in Windows Phone 将第3方库中的对象用作Django Rest Framework开发中的模型 - Use objects from a 3rd party library as models in Django Rest Framework development 从Rails + Ember应用程序中的GET请求中删除CSRF令牌到第三方API? - Removing a CSRF token from a GET request to a 3rd party API in a Rails + Ember app? rails4从fullcalendar(3rd party API)访问控制器中的参数 - rails4 accessing params in controller from fullcalendar (3rd party API) 从ASP.Net MVC控制器中的第三方API返回JSON字符串 - Returning JSON string from 3rd party API in ASP.Net MVC controller 将循环结构转换为JSON-第三方API - Converting circular structure to JSON - 3rd party API
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM