简体繁体 English

将NiFi连接到ElasticSearch

[英]Connecting NiFi to ElasticSearch

原文 2017-05-05 13:47:56 7 1 elasticsearch/ apache-nifi/ processors

I'm trying to solve one task and will appreciate any help - links to documentation, or links to forums, or other FAQs besides https://cwiki.apache.org/confluence/display/NIFI/FAQs , or any meaningful answer in this post =) . 我正在尝试解决一项任务，并会感谢任何帮助 - 文档链接，论坛链接或其他常见问题解答，除https://cwiki.apache.org/confluence/display/NIFI/FAQs ，或任何有意义的答案这篇文章=）。

So, I have the following task: Initial part of my system collects data each 5-15 min from different DB sources. 所以，我有以下任务：我的系统的初始部分每5-15分钟从不同的数据库源收集数据。 Then I remove duplicates, remove junk, combine data from different sources according to logic and then redirect it to second part of the system as several streams. 然后我删除重复，删除垃圾，根据逻辑组合来自不同来源的数据，然后将其重定向到系统的第二部分作为几个流。 As far as I know, "NiFi" can do this task in the best way =). 据我所知，“NiFi”可以最好的方式完成这项任务=）。

Currently I can successfully get information from InfluxDB by "GetHTTP" processor. 目前，我可以通过“GetHTTP”处理器成功从InfluxDB获取信息。 However I can't configure same kind of processor for getting information from Elastic DB with all necessary options. 但是我无法配置相同类型的处理器来从Elastic DB获取具有所有必要选项的信息。 I'd like to receive data each 5-15 minutes for time period from "now-minus-<5-15 minutes>" to "now". 我希望从“现在 - 减去 - <5-15分钟>”到“现在”的时间段内每5-15分钟接收一次数据。 (depends on scheduler period) with several additional filters. （取决于调度程序周期）和几个额外的过滤器。 If I understand it right, this can be achieved either by subscription to "_index" or by regular requests to DB with desired interval. 如果我理解正确，可以通过订阅“_index”或通过定期请求DB以期望的间隔来实现。

I know that NiFi has several specific Processors designed for Elasticsearch (FetchElasticsearch5, FetchElasticsearchHttp, QueryElasticsearchHttp, ScrollElasticsearchHttp) as well as GetHTTP and PostHTTP Processors. 我知道NiFi有几个专为Elasticsearch设计的处理器（FetchElasticsearch5，FetchElasticsearchHttp，QueryElasticsearchHttp，ScrollElasticsearchHttp）以及GetHTTP和PostHTTP处理器。 However, unfortunately, I have lack of information or even better - examples - how to configure their "Properties" for my purposes =(. 然而，不幸的是，我缺乏信息甚至更好 - 例子 - 如何为我的目的配置他们的“属性”=（。

What's the difference between FetchElasticsearchHttp, QueryElasticsearchHttp? FetchElasticsearchHttp，QueryElasticsearchHttp之间有什么区别？ Which one fits better for my task? 哪一个更适合我的任务？ What's the difference between GetHTTP and QueryElasticsearchHttp besides several specific fields? GetHTTP和QueryElasticsearchHttp除了几个特定领域之外有什么区别？ Will GetHTTP perform the same way if I tune it as I need? 如果我根据需要调整它，GetHTTP会以同样的方式运行吗？

Any advice? 有什么建议？

I will be grateful for any help. 我将不胜感激任何帮助。

1 个解决方案

The ElasticsearchHttp processors try to make it easier to interact with ES by generating the appropriate REST API call based on the properties you set. ElasticsearchHttp处理器通过根据您设置的属性生成适当的REST API调用，尝试使与ES交互变得更容易。 If you know the full URL you need, you could use GetHttp or InvokeHttp. 如果您知道所需的完整URL，则可以使用GetHttp或InvokeHttp。 However the ESHttp processors let you put in just the stuff you're looking for, and it will generate the URL and return the results. 然而，ESHttp处理器允许您输入您正在寻找的东西，它将生成URL并返回结果。

FetchElasticsearch (and its variants) is used to get a particular document when you know the identifier. 当您知道标识符时， FetchElasticsearch （及其变体）用于获取特定文档。 This is sometimes used after a search/query, to return documents one at a time after you know which ones you want. 这有时在搜索/查询后使用，在您知道所需文档后，一次返回一个文档。

QueryElasticsearchHttp is for when you want to do a Lucene-style query of the documents, when you don't necessarily know which documents you want. QueryElasticsearchHttp用于当您不想知道所需文档时，想要对文档执行Lucene样式查询。 It will only return up to the value of index.max_result_window for that index. 它只返回该索引的index.max_result_window值。 To get more records, you can use ScrollElasticsearchHttp afterwards. 要获得更多记录，您可以在之后使用ScrollElasticsearchHttp 。 NOTE : QueryElasticsearchHttp expects a query that will work as the "q" parameter of the URL. 注意：QueryElasticsearchHttp需要一个将作为URL的“q”参数的查询。 This "mini-language" does not support all fields/operators (see here for more details). 这种“迷你语言”不支持所有字段/运营商（有关详细信息，请参阅此处）。

For your use case, you likely need InvokeHttp in order to issue the kind of query you describe. 对于您的用例，您可能需要InvokeHttp才能发出您描述的查询类型。 This article describes how to issue a query for the last 15 minutes. 本文介绍如何发出最近15分钟的查询。 Once your results are returned, you might need some combination of EvaluateJsonPath and/or SplitJson to work with the individual documents, see the Elasticsearch REST API documentation (and NiFi processor documentation) for more details. 返回结果后，您可能需要使用EvaluateJsonPath和/或SplitJson的某种组合来处理各个文档，有关详细信息，请参阅Elasticsearch REST API文档（和NiFi处理器文档）。