简体   繁体   English

如何使用 Apache NiFi 处理 Elasticsearch 中的动态索引创建?

[英]How to handle dynamic index creation in Elasticsearch using Apache NiFi?

I am routing data through to Elasticsearch using Nifi.我正在使用 Nifi 将数据路由到 Elasticsearch。 I'm using NiFi to dynamically create indices based on a set of attributes.我正在使用 NiFi 基于一组属性动态创建索引。 I'm using Index Lifecycle Policy Management in Elasticsearch which requires all indices to be manually bootstrapped beforehand for ILM settings to be applied.我在 Elasticsearch 中使用索引生命周期策略管理,它要求事先手动引导所有索引以应用 ILM 设置。 Since my NiFi flow automatically ingests messages into Elasticsearch any index created automatically will not have have ILM policies applied.由于我的 NiFi 流自动将消息提取到 Elasticsearch 中,任何自动创建的索引都不会应用 ILM 策略。

Currently my flow is Nifi Consume from Kafka --> Update Attribute --> PutElasticsearch Record.目前我的流程是来自 Kafka 的 Nifi Consume --> 更新属性 --> PutElasticsearch 记录。

A solution (I think) would be to call the invokehttp processor in front of the PutElasticsearch processor to bootstrap the indices dynamically via the attributes extracted before ingesting into elasticsearch. Indices are dynamically created using the syntax: index_${attribute_1}_${attribute_2} .一个解决方案(我认为)是调用invokehttp处理器前面的PutElasticsearch处理器,通过在摄取到 elasticsearch 之前提取的属性动态引导索引。使用语法动态创建索引: index_${attribute_1}_${attribute_2} My only concern here is the invoke invokehttpprocessor would run with every new flowfile.我在这里唯一关心的是 invoke invokehttpprocessor 将与每个新的流文件一起运行。 This could be thousands of calls to bootstrap an index.这可能是引导索引的数千次调用。 And if the index already exists there could be collision there.如果索引已经存在,那里可能会发生冲突。

Is this really the best way to do this?这真的是最好的方法吗? Perhaps I could run the QueryElasticsearchRecord processor to get a list of indices and somehow match that against incoming flowfiles on the attribute_1 and attribute_2 field.也许我可以运行QueryElasticsearchRecord处理器来获取索引列表,并以某种方式将其与attribute_1attribute_2字段上的传入流文件相匹配。 But that would still require a continuous query, I think?但我认为这仍然需要连续查询?

What you could do is have the InvokeHTTP run if and only if it sees a specific value or attribute that would signal that a new (previously unsent) index value to input into ElasticSearch is required.您可以做的是让 InvokeHTTP 运行,当且仅当它看到一个特定的值或属性,该值或属性将表明需要一个新的(以前未发送的)索引值输入到 ElasticSearch 中。 Just an idea if you want to head down that route.如果您想沿着那条路线前进,这只是一个主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM