简体   繁体   English

在Nifi中从ExecuteScript传输多个flowFiles

[英]Transfer multiple flowFiles from ExecuteScript in Nifi

I am trying to generate multiple flowfiles from one flowfile using an ExecuteScript processor in python. 我试图使用python中的ExecuteScript处理器从一个流文件生成多个流文件。

The ouputs flowfiles depend on one attribute for configuration and the input flowfile (xml content). ouputs流文件依赖于配置的一个属性和输入流文件(xml内容)。

I tried many things but I always ends with error like : 我尝试了很多东西,但我总是以错误结束:

  • this flowfile is already marked for transfer 此流文件已标记为要传输
  • transfer relationship not specified 转移关系未指定

Below the last version : 在最后一个版本下面:

from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
import java.io
from org.python.core.util import StringUtil

class PyStreamCallback(StreamCallback):
    def __init__(self, flowFile):
        global matched
        self.parentFlowFile = flowFile
        pass

    def process(self, inputStream, outputStream):
        try:
            text_content = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
            flowfiles_list = []

            new_xml = "blabla"
            outputStream.write(bytearray(new_xml.encode('utf-8')))

            for n in range(0,5):
                flowFile = session.create(self.parentFlowFile)
                if (flowFile != None):
                    flowFile = session.write(flowFile, "Nothing")
                    flowfiles_list.append(flowFile)

            for flow in flowfiles_list:
                session.transfer(flow, REL_SUCCESS)
        except:
            print('Error inside process')
            raise

originalFlowFile = session.get()
if(originalFlowFile != None):
    try :
        originalFlowFile = session.write(originalFlowFile, PyStreamCallback(originalFlowFile))
        session.remove(originalFlowFile)

    except Exception as e:
        originalFlowFile = session.putAttribute(originalFlowFile,'python_error', str(e))
        session.transfer(originalFlowFile, REL_FAILURE)

Can someone tell me what I am doing wrong and how to achieve what I want to do? 有人能告诉我我做错了什么以及如何实现我想做的事情?

Here are some notes on your script: 以下是您脚本的一些注意事项:

1) You are subclassing StreamCallback and writing to the original flow file, but then you remove it later. 1)您是StreamCallback的子类并写入原始流文件,但之后您将其删除。 StreamCallback is for when you want to overwrite the contents of the existing flow file. StreamCallback适用于要覆盖现有流文件的内容。 If you don't need to do that, you can use InputStreamCallback as the base class, it won't take an outputStream arg but you wouldn't need it in that case. 如果您不需要这样做,可以使用InputStreamCallback作为基类,它不会采用outputStream arg,但在这种情况下您不需要它。 You'd also use session.read on the original flow file rather than session.write . 你也想使用session.read原始流文件,而不是session.write

2) The line flowFile = session.write(flowFile, "Nothing") isn't valid because session.write needs an OutputStreamCallback or StreamCallback as the argument (same as where you call it with PyStreamCallback below). 2)行flowFile = session.write(flowFile, "Nothing")无效,因为session.write需要一个OutputStreamCallback或StreamCallback作为参数(与下面用PyStreamCallback调用它的地方相同)。 When that throws an error, it gets raised all the way to the top level of the script, but by then you've created a flow file and didn't reach the statement that transfers the flowfiles_list to REL_SUCCESS. 当抛出错误时,它会一直提升到脚本的顶层,但到那时你已经创建了一个流文件而没有到达将flowfiles_list传递给REL_SUCCESS的语句。 Consider adding a try/except around the session.write , then you could remove the newly created flow file and then raise the exception. 考虑在session.write周围添加try/except ,然后您可以删除新创建的流文件,然后引发异常。

3) If you want to read the entire content of the incoming flow file into memory (which you are currently doing), then remove the original flow file and instead create new flow files from it, consider instead using the version of session.read() that returns an InputStream (ie doesn't require an InputStreamCallback ). 3)如果要将传入流文件的整个内容读入内存(当前正在执行),则删除原始流文件,而是从中创建新的流文件,而是考虑使用session.read()的版本session.read()返回一个InputStream(即不需要InputStreamCallback )。 Then you can save the contents into a global variable and/or pass it into an OutputStreamCallback when you want to do write something to the created flow files. 然后,当您想要对创建的流文件写入内容时,可以将内容保存到全局变量中和/或将其传递给OutputStreamCallback。 Something like: 就像是:

inputStream = session.read(originalFlowFile)
text_content = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
inputStream.close()
flowfiles_list = []

for n in range(0,5):
    flowFile = session.create(originalFlowFile)
    if (flowFile != None):
        try:
            flowFile = session.write(flowFile, PyStreamCallback(text_content))
            flowfiles_list.append(flowFile)
        except Exception as e:
            session.remove(flowFile)
            raise

for flow in flowfiles_list:
    session.transfer(flow, REL_SUCCESS)

session.remove(originalFlowFile)

This doesn't include the refactor of PyStreamCallback to be an OutputStreamCallback that takes a string arg instead of a FlowFile in the constructor. 这不包括PyStreamCallback的重构是一个OutputStreamCallback,它在构造函数中采用字符串arg而不是FlowFile。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM