简体   繁体   English

NiFi 中的 Python ExecuteScript:转换流文件属性和内容

[英]Python ExecuteScript in NiFi: Transform flowfile attributes & content

I am trying to create a Python script in NiFi that:我正在尝试在 NiFi 中创建一个 Python 脚本:

  1. Reads some attributes from an incoming flowfile从传入的流文件中读取一些属性
  2. Read the json content of the flowfile & extract specific fields读取流文件的 json 内容并提取特定字段
  3. Write attributes to outgoing flowfile将属性写入传出流文件
  4. Overwrite incoming flowfile with new content that is created in the script (eg API call that returns new json) and send it to SUCCESS relationship OR remove the old flowfile and create new with desired content用脚本中创建的新内容覆盖传入的流文件(例如,返回新 json 的 API 调用)并将其发送到 SUCCESS 关系或删除旧的流文件并使用所需内容创建新的流文件

What i ve done so far:到目前为止我所做的:

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback,InputStreamCallback, OutputStreamCallback

class OutputWrite(OutputStreamCallback, obj):

def __init__(self):
    self.obj = obj

def process(self, outputStream):

    outputStream.write(bytearray(json.dumps(self.obj).encode('utf')))

###end class###

flowfile = session.get()

if flowfile != None:

**#1) Get flowfile attributes**

    headers = {
        'Accept-Encoding': 'gzip, deflate, br',
        'Accept': 'application/json, text/plain, */*',
        'Cache-Control': 'no-cache',
        'Ocp-Apim-Trace': 'true',
        'Authorization': flowfile.getAttribute('Authorization')
    }

    collection = flowfile.getAttribute('collection')
    dataset = flowfile.getAttribute('dataset')

    **#2)Get flowfile content**

    stream_content = session.read(flowfile)
    text_content = IOUtils.toString(stream_content, StandardCharsets.UTF_8)
    json_content = json.loads(text_content)

    records = json_content['result']['count']
    pages = records/10000

    **#3) Write flowfile attributes**

    flowfile = session.putAttribute(flowfile, 'collection', collection)
    flowfile = session.putAttribute(flowfile, 'dataset', dataset)

    **#API operations: output_json with desired data**

    output_json = {some data}

    **#4) Write final JSON data to output flowfile**

    flowfile = session.write(flowfile, OutputWrite(output_json))

    session.transfer(flowfile, REL_SUCCESS)
    session.commit()

My problem is that i can't find a way to pass a reference to the desired output_json object as an argument in the OutputStreamCallback class.我的问题是我找不到将所需 output_json 对象的引用作为参数传递给 OutputStreamCallback 类的方法。 Any ideas on how to resolve this or maybe a better approach?关于如何解决这个问题或更好的方法的任何想法?

Is it maybe easier to perform all API operations in this case within the process function of the class, but then how do i get access to the incoming flowfile attributes within the process function (requires a session or a flowfile object) ?在这种情况下,在类的 process 函数中执行所有 API 操作是否可能更容易,但是我如何访问 process 函数中的传入流文件属性(需要会话或流文件对象)?

Any help much appreciated!非常感谢任何帮助!

You can try something like this-你可以试试这样的——

import json
import sys
import traceback
from java.nio.charset import StandardCharsets
from org.apache.commons.io import IOUtils
from org.apache.nifi.processor.io import StreamCallback
from org.python.core.util import StringUtil

class TransformCallback(StreamCallback):
    def __init__(self):
        pass

    def process(self, inputStream, outputStream):
        try:
            # Read input FlowFile content
            input_text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
            input_obj = json.loads(input_text)
            # Transform content
            output_obj = input_obj   #your input content

            #perform Data tranformation on output_obj

            # Write output content
            output_text = json.dumps(outputJson)
            outputStream.write(StringUtil.toBytes(output_text))
        except:
            traceback.print_exc(file=sys.stdout)
            raise


flowFile = session.get()
if flowFile != None:
    flowFile = session.write(flowFile, TransformCallback())

    # Finish by transferring the FlowFile to an output relationship
    session.transfer(flowFile, REL_SUCCESS)

I've included example Python code below which allows for a custom PyStreamCallback class which implements logic to transform JSON in the flowfile content from Matt Burgess' blog article on the topic, but I would encourage you to consider using native processors for UpdateAttribute and EvaluateJSONPath to perform the relevant activities and only use custom code where it is specifically needed to perform a task that NiFi doesn't handle out of the box.我在下面包含了示例 Python 代码,该代码允许自定义PyStreamCallback类,该类实现逻辑以转换来自Matt Burgess 的关于该主题的博客文章的流文件内容中的 JSON,但我鼓励您考虑使用本机处理器进行UpdateAttributeEvaluateJSONPath执行相关活动,并且仅在特别需要执行 NiFi 无法立即处理的任务时才使用自定义代码。

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
    obj = json.loads(text)
    newObj = {
          "Range": 5,
          "Rating": obj['rating']['primary']['value'],
          "SecondaryRatings": {}
        }
    for key, value in obj['rating'].iteritems():
      if key != "primary":
        newObj['SecondaryRatings'][key] = {"Id": key, "Range": 5, "Value": value['value']}

    outputStream.write(bytearray(json.dumps(newObj, indent=4).encode('utf-8'))) 

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)

Update:更新:

To access the attributes of the flowfile within the callback, simply pass it as an argument to the constructor, store it as a field, and reference it within the process method.要在回调中访问流文件的属性,只需将其作为参数传递给构造函数,将其存储为字段,并在process方法中引用它。 Here is a very simple example that concatenates the value of attribute my_attr to the incoming flowfile content and writes it back:这是一个非常简单的示例,它将属性my_attr的值连接到传入的流文件内容并将其写回:

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
    def __init__(self, flowfile):
        self.ff = flowfile
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        text += self.ff.getAttribute('my_attr')
        outputStream.write(bytearray(text.encode('utf-8')))

flowFile = session.get()
if (flowFile != None):
    flowFile = session.write(flowFile,PyStreamCallback(flowFile))
    session.transfer(flowFile, REL_SUCCESS)

Incoming flowfile:传入流文件:

--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
    Value: 'Tue Mar 13 13:10:48 PDT 2018'
Key: 'lineageStartDate'
    Value: 'Tue Mar 13 13:10:48 PDT 2018'
Key: 'fileSize'
    Value: '30'
FlowFile Attribute Map Content
Key: 'filename'
    Value: '1690494181462176'
Key: 'my_attr'
    Value: 'This is an attribute value.'
Key: 'path'
    Value: './'
Key: 'uuid'
    Value: 'dc93b715-50a0-43ce-a4db-716bd9ec3205'
--------------------------------------------------
This is some flowfile content.

Outgoing flowfile:传出流文件:

--------------------------------------------------
Standard FlowFile Attributes
Key: 'entryDate'
    Value: 'Tue Mar 13 13:10:48 PDT 2018'
Key: 'lineageStartDate'
    Value: 'Tue Mar 13 13:10:48 PDT 2018'
Key: 'fileSize'
    Value: '57'
FlowFile Attribute Map Content
Key: 'filename'
    Value: '1690494181462176'
Key: 'my_attr'
    Value: 'This is an attribute value.'
Key: 'path'
    Value: './'
Key: 'uuid'
    Value: 'dc93b715-50a0-43ce-a4db-716bd9ec3205'
--------------------------------------------------
This is some flowfile content.This is an attribute value.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM