简体   繁体   中英

How to pass a parameter to python script in Nifi

Maybe this is a stupid question, but I have to ask.

I have a Collect_data processor in Nifi and it streams messages into another process that use a python script to parse that and create json file. The problem is that I don't know what is the input for function in python script. How to pass those messages (16-digit numbers) from Collect_data processor into next processor contains python script. Is there any good, basic example about this?

I was already looking for some examples online, but not really get it.

import datetime
import hashlib
from urlparse import urlparse, parse_qs
import sys
from urlparse import urlparse, parse_qs
from datetime import *
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
from time import time


def parse_zap(inputStream, outputStream):
    data = inputStream
    buf = (hashlib.sha256(bytearray.fromhex(data)).hexdigest())
    buf = int(buf, 16)
    buf_check = str(buf)
    if buf_check[17] == 2:
        pass
    datetime_now = datetime.now()
    log_date = datetime_now.isoformat()
    try:
        mac = buf_check[7:14].upper()
        ams_id = buf_check[8:]
        action = buf_check[3:4]
        time_a = int(time())
        dict_test = {
        "user": {
            "guruq" : 'false'
        },
        "device" : {
            "type" : "siolbox",
            "mac": mac
        },
        "event" : {
            "origin" : "iptv",
            "timestamp": time_a,
            "type": "zap",
            "product-type" : "tv-channel",
            "channel": {
                "id" : 'channel_id',
                "ams-id": ams_id
            },
            "content": {
                "action": action
            }
        }
        }
        return dict_test
    except Exception as e:
        print('%s nod PARSE 500 \"%s\"' % (log_date, e))

I thank I'm reading correctly, but now I can't create output. Thanks in advance.

I think I understand your question, but it is somewhat ambiguous about your flow. I'm answering for a few different possible scenarios.

  1. You have a processor which obtains data from a source (ie FetchFTP ) and has a connection to an ExecuteScript processor which contains a Python script to transform those values. In this case, the Python script can operate on the flowfile attributes and content directly using the standard API. See Matt Burgess' blog for many examples of writing custom scripts to operate on the data.
  2. You have a processor which obtains data from a source and has a connection to an ExecuteStreamCommand processor, which invokes an external Python script using a command like python my_external_script.py arg1 arg2 ... . In this case, the flowfile content is passed to STDIN by the ExecuteStreamCommand processor, so your script should consume it in that way. This answer explains more about using ExecuteStreamCommand with Python scripts.
  3. You have a custom processor that internally calls a separate Python process. This is a bad idea and should be refactored to one of the other models. This breaks separation of concerns, loses the processor lifecycle assistance, obscures thread handling and timing, lacks provenance visibility, and goes against the development model of NiFi.

If your Python script is very simple, you could put it in a ScriptedRecordWriter and use that to handle multiple "records" simultaneously to gain performance benefits. This might be advanced for your use case, depending on what your flow and incoming data looks like.

Update 2018-10-03 10:50

Try using this script in the ExecuteScript body:

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
        result = parse_zap(text)

        outputStream.write(bytearray(result.encode('utf-8')))

flowFile = session.get()
if (flowFile != None):
    flowFile = session.write(flowFile,PyStreamCallback())
    flowFile = session.putAttribute(flowFile, "parsed_zap", "true")
    session.transfer(flowFile, REL_SUCCESS)

// Your parse_zap() method here, with the signature changed to just accept a single string
...

Take a look a this script:

import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    text = IOUtils.readLines(inputStream, StandardCharsets.UTF_8)
    for line in text[1:]:
        outputStream.write(line + "\n") 

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)

It takes from a property the number of lines to be removed from the flowfile, and then take the flowfile and write it again without this lines, it's easy and a good example of both, how to use the properties, and how to use the flowfile.

Based in your updated code, your code must look like this:

import datetime
import hashlib
from urlparse import urlparse, parse_qs
import sys
from urlparse import urlparse, parse_qs
from datetime import *
import json
import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
from time import time


class PyStreamCallback(StreamCallback):
  def __init__(self):
        pass
  def process(self, inputStream, outputStream):
    data = inputStream
    buf = (hashlib.sha256(bytearray.fromhex(data)).hexdigest())
    buf = int(buf, 16)
    buf_check = str(buf)
    if buf_check[17] == 2:
        pass
    datetime_now = datetime.now()
    log_date = datetime_now.isoformat()
    try:
        mac = buf_check[7:14].upper()
        ams_id = buf_check[8:]
        action = buf_check[3:4]
        time_a = int(time())
        dict_test = {
        "user": {
            "guruq" : 'false'
        },
        "device" : {
            "type" : "siolbox",
            "mac": mac
        },
        "event" : {
            "origin" : "iptv",
            "timestamp": time_a,
            "type": "zap",
            "product-type" : "tv-channel",
            "channel": {
                "id" : 'channel_id',
                "ams-id": ams_id
            },
            "content": {
                "action": action
            }
        }
        }
        return dict_test
    except Exception as e:
        print('%s nod PARSE 500 \"%s\"' % (log_date, e))

flowFile = session.get()
if (flowFile != None):
  flowFile = session.write(flowFile,PyStreamCallback())
  flowFile = session.putAttribute(flowFile, "filename", flowFile.getAttribute('filename').split('.')[0]+'_translated.json')
  session.transfer(flowFile, REL_SUCCESS)        

I was able to access parameters from a Python script using the method described here .

Basically, all you have to do is:

  1. Stop the process that executes the Python script
  2. Configure the process
  3. Add a property to the process (for example, myProperty )
  4. Access the property from the script like this: myProperty.evaluateAttributeExpressions().getValue()
  5. Restart the process

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM