简体   繁体   English

如何在 NiFi 的 ExecuteStreamCommand 处理器中读取文件

[英]How to read files in ExecuteStreamCommand processor in NiFi

My end goal is to mask the data in one particular file.我的最终目标是屏蔽一个特定文件中的数据。 I want to move files from one place to another.我想将文件从一个地方移动到另一个地方。 During this transfer process, I have to mask the data using a Python script.在此传输过程中,我必须使用 Python 脚本屏蔽数据。 So, I designed below flow:所以,我设计了以下流程:

GetFile > ExecuteStreamCommmand > PutFile

I designed one Python script using pandas .我使用pandas设计了一个Python脚本。 I am running this NiFi on Virtual Machine created on Google Cloud Platform where I have installed Python-2.7 and NiFi-1.9.1 .我在安装了Python-2.7NiFi-1.9.1Google Cloud Platform上创建的虚拟机上运行这个NiFi Below is my Pandas code:以下是我的熊猫代码:

import pandas as pd
readFile = pd.read_csv("/path",sep=" ",header=None)
readFile.columns = ['IP']
readFile['IP'] = readFile['IP'].replace(regex='((?<=[0-9])[0-9]|(?<=\.)[0-9])',value='X')
readFile.to_csv("/path", sep=' ')

I have below doubts:我有以下疑问:
1) Using getFile processor I am passing the file in the queue to the next processor ie ExecuteStreamCommand processor. 1) 使用 getFile 处理器,我将队列中的文件传递给下一个处理器,即 ExecuteStreamCommand 处理器。
2) Also, in my Python code, I am trying to read the data from the same input directory that was passed in the GetFile processor but now the file has been moved to the queue between getfile > executestreamcommand. 2) 此外,在我的 Python 代码中,我试图从传递给 GetFile 处理器的相同输入目录中读取数据,但现在文件已移至 getfile > executestreamcommand 之间的队列。 So how will it read it?那么它将如何读取呢?
3) After the python script is executed how can I use a putFile processor to save it back at some other place? 3) 执行 python 脚本后,如何使用 putFile 处理器将其保存回其他地方?

I am new to NiFi so trying to understand basic things.我是 NiFi 的新手,所以想了解基本的东西。 Also, I have attached the flow and error screenshot.另外,我附上了流程和错误截图。 在此处输入图像描述

ExecuteStreamCommand 执行流命令

The content of the flow file passed into a command (python in your case) as stdin stream流文件的内容作为标准输入流传stdin命令(在您的情况下为 python)

so, you have to use following code:所以,你必须使用下面的代码:

readFile = pd.read_json(sys.stdin)

on other hand if you need to apply regexp replace to the flow file, you could try to use ReplaceText processor instead of ExecuteStreamCommand另一方面,如果您需要将正则表达式替换应用于流文件,您可以尝试使用ReplaceText处理器而不是ExecuteStreamCommand

You may need to provide the python.py file in the Volume where nifi has registered.您可能需要在 nifi 已注册的卷中提供 python.py 文件。

for example opt/nifi/nifi-current/ if its a docker image例如 opt/nifi/nifi-current/ 如果它是一个 docker 镜像

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何读取Azure数据工厂中的*.txt文件? - How to read *.txt files in Azure Data Factory? 如何将 xlsx 或 xls 文件读取为 spark dataframe - How to read xlsx or xls files as spark dataframe 如何从 apache beam python 读取 s3 文件? - how to read s3 files from apache beam python? 如何在 req.body 中发送 post 请求后读取文件 - How to read files after sending a post request in the req.body 如何使用PySpark读取目录下的Parquet文件? - How to read Parquet files under a directory using PySpark? 如何使用 WSO2 ESB 注册一个新的消息处理器 class - How to register a new message processor class with WSO2 ESB 读取node.js中的文件和目录 - read files and read directory in node.js Snowflake - 如何从 S3 中的镶木地板文件中读取元数据 - Snowflake - how to read metadata from parquet files in S3 读取 adf 管道中的特定文件名 - read specific files names in adf pipeline 由于“智能文档质量处理器”现在不可用,如何在 google document AI 中获取文档质量分数? - How to get the document quality score in google document AI as the "Intelligent document quality processor" is not available now?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM