如何在Nifi中将参数传递给python脚本

Question

也许这是一个愚蠢的问题，但我不得不问。

我在 Nifi 中有一个 Collect_data 处理器，它将消息流式传输到另一个使用 python 脚本解析并创建 json 文件的进程。 问题是我不知道 python 脚本中函数的输入是什么。 如何将这些消息（16 位数字）从 Collect_data 处理器传递到下一个处理器包含 python 脚本。 有什么好的，基本的例子吗？

我已经在网上寻找一些例子，但并没有真正得到它。

import datetime
import hashlib
from urlparse import urlparse, parse_qs
import sys
from urlparse import urlparse, parse_qs
from datetime import *
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
from time import time


def parse_zap(inputStream, outputStream):
    data = inputStream
    buf = (hashlib.sha256(bytearray.fromhex(data)).hexdigest())
    buf = int(buf, 16)
    buf_check = str(buf)
    if buf_check[17] == 2:
        pass
    datetime_now = datetime.now()
    log_date = datetime_now.isoformat()
    try:
        mac = buf_check[7:14].upper()
        ams_id = buf_check[8:]
        action = buf_check[3:4]
        time_a = int(time())
        dict_test = {
        "user": {
            "guruq" : 'false'
        },
        "device" : {
            "type" : "siolbox",
            "mac": mac
        },
        "event" : {
            "origin" : "iptv",
            "timestamp": time_a,
            "type": "zap",
            "product-type" : "tv-channel",
            "channel": {
                "id" : 'channel_id',
                "ams-id": ams_id
            },
            "content": {
                "action": action
            }
        }
        }
        return dict_test
    except Exception as e:
        print('%s nod PARSE 500 \"%s\"' % (log_date, e))

我感谢我阅读正确，但现在我无法创建输出。 提前致谢。

Answer 1

我想我理解你的问题，但你的流程有点模棱两可。 我正在回答几种不同的可能情况。

您有一个处理器，它从源（即FetchFTP ）获取数据，并连接到一个ExecuteScript处理器，该处理器包含一个 Python 脚本来转换这些值。 在这种情况下，Python 脚本可以直接使用标准 API 对流文件属性和内容进行操作。 有关编写自定义脚本以对数据进行操作的许多示例，请参阅Matt Burgess 的博客。
您有一个处理器，它从源获取数据并连接到ExecuteStreamCommand处理器，该处理器使用诸如python my_external_script.py arg1 arg2 ...类的命令调用外部 Python 脚本。 在这种情况下，流文件内容由ExecuteStreamCommand处理器传递给STDIN ，因此您的脚本应该以这种方式使用它。 此答案解释了有关将ExecuteStreamCommand与 Python 脚本一起使用的更多信息。
您有一个自定义处理器，它在内部调用一个单独的 Python 进程。 这是一个坏主意，应该重构为其他模型之一。 这打破了关注点分离，失去了处理器生命周期的帮助，模糊了线程处理和计时，缺乏出处可见性，并且违背了 NiFi 的开发模型。

如果您的 Python 脚本非常简单，您可以将它放在ScriptedRecordWriter中并使用它同时处理多个“记录”以获得性能优势。 这可能会针对您的用例进行改进，具体取决于您的流程和传入数据的样子。

2018-10-03 10:50 更新

尝试在ExecuteScript正文中使用此脚本：

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        result = parse_zap(text)

        outputStream.write(bytearray(result.encode('utf-8')))

flowFile = session.get()
if (flowFile != None):
    flowFile = session.write(flowFile,PyStreamCallback())
    flowFile = session.putAttribute(flowFile, "parsed_zap", "true")
    session.transfer(flowFile, REL_SUCCESS)

// Your parse_zap() method here, with the signature changed to just accept a single string
...

Answer 2

看看这个脚本：

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        outputStream.write(line + "\n") 

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)

它从属性中获取要从流文件中删除的行数，然后获取流文件并在没有这些行的情况下再次编写它，这很简单，也是两者的一个很好的例子，如何使用属性以及如何使用流文件.

根据您更新的代码，您的代码必须如下所示：

import datetime
import hashlib
from urlparse import urlparse, parse_qs
import sys
from urlparse import urlparse, parse_qs
from datetime import *
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
from time import time


class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    data = inputStream
    buf = (hashlib.sha256(bytearray.fromhex(data)).hexdigest())
    buf = int(buf, 16)
    buf_check = str(buf)
    if buf_check[17] == 2:
        pass
    datetime_now = datetime.now()
    log_date = datetime_now.isoformat()
    try:
        mac = buf_check[7:14].upper()
        ams_id = buf_check[8:]
        action = buf_check[3:4]
        time_a = int(time())
        dict_test = {
        "user": {
            "guruq" : 'false'
        },
        "device" : {
            "type" : "siolbox",
            "mac": mac
        },
        "event" : {
            "origin" : "iptv",
            "timestamp": time_a,
            "type": "zap",
            "product-type" : "tv-channel",
            "channel": {
                "id" : 'channel_id',
                "ams-id": ams_id
            },
            "content": {
                "action": action
            }
        }
        }
        return dict_test
    except Exception as e:
        print('%s nod PARSE 500 \"%s\"' % (log_date, e))

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)

Answer 3

我能够使用此处描述的方法从 Python 脚本访问参数。

基本上，您所要做的就是：

停止执行 Python 脚本的进程
配置流程
向进程添加属性（例如， myProperty ）
像这样从脚本访问属性： myProperty.evaluateAttributeExpressions().getValue()
重启进程

如何在Nifi中将参数传递给python脚本

问题描述

3 个解决方案

解决方案1
3 2018-10-02 21:57:01

解决方案2
3 已采纳 2018-10-03 06:11:15

解决方案3
0 2022-05-10 18:31:15

如何在Nifi中将参数传递给python脚本

问题描述

3 个解决方案

解决方案1 3 2018-10-02 21:57:01

解决方案2 3 已采纳 2018-10-03 06:11:15

解决方案3 0 2022-05-10 18:31:15

解决方案1
3 2018-10-02 21:57:01

解决方案2
3 已采纳 2018-10-03 06:11:15

解决方案3
0 2022-05-10 18:31:15