简体繁体 English

Apache Nifi 中的更多处理器是否会导致吞吐量降低？

[英]Does more processors in Apache Nifi lead to lower throughput?

原文 2022-05-09 03:30:34 5 1 apache-nifi/ implementation/ throughput

In Apache Nifi, there are connections between each processors, which acts like queue of FlowFiles, and Nifi by default persists data content of FlowFile on disk.在 Apache Nifi 中，每个处理器之间都有连接，就像 FlowFiles 队列一样，Nifi 默认将 FlowFile 的数据内容持久化在磁盘上。 Does it mean each of such connection persists FlowFiles on disk?这是否意味着每个这样的连接都将 FlowFiles 保存在磁盘上？ If that were true, each time of delivery of FlowFiles from one processor to another would mean one disk read and write, thus more processors would lead to more disk reads and writes, which in turn would lower the entire throughput.如果这是真的，那么每次将 FlowFiles 从一个处理器传递到另一个处理器都意味着一次磁盘读写，因此更多的处理器将导致更多的磁盘读写，这反过来会降低整个吞吐量。 Is my understanding correct?我的理解正确吗？ and what is the best practice to avoid it, writing all things in one processor?避免它的最佳做法是什么，在一个处理器中编写所有内容？ Thanks.谢谢。

1 个解决方案

NiFi internals are a bit different. NiFi 内部结构有点不同。 Attribute values persist in memory and FlowFile content persists on disk.属性值保留在 memory 中，FlowFile 内容保留在磁盘上。 So if the processor is doing an operation on FlowFile attributes such as UpdateAttribute then no need to access the content but the operation is happening on content (Data Enrichment) such as ValidateRecord then disk IO will be involved.因此，如果处理器正在对 FlowFile 属性（如UpdateAttribute ）执行操作，则无需访问内容，但操作正在对内容（数据丰富）（如ValidateRecord ）进行操作，然后将涉及磁盘 IO。 If you observe any processor, you can see Read/Write stats, this tells you the amount of IO that happened.如果您观察任何处理器，您可以看到Read/Write统计信息，这会告诉您发生的 IO 的数量。 Refer to this, Anatomy of a Processor , for more details.有关更多详细信息，请参阅处理器剖析。

If you have a custom logic that needs to modify both attributes and content then you can implement both the operations in one custom processor!如果您有一个需要修改属性和内容的自定义逻辑，那么您可以在一个自定义处理器中实现这两个操作！