[英]How to read files in ExecuteStreamCommand processor in NiFi
My end goal is to mask the data in one particular file.我的最终目标是屏蔽一个特定文件中的数据。 I want to move files from one place to another.我想将文件从一个地方移动到另一个地方。 During this transfer process, I have to mask the data using a Python script.在此传输过程中,我必须使用 Python 脚本屏蔽数据。 So, I designed below flow:所以,我设计了以下流程:
GetFile > ExecuteStreamCommmand > PutFile
I designed one Python
script using pandas
.我使用pandas
设计了一个Python
脚本。 I am running this NiFi
on Virtual Machine created on Google Cloud Platform
where I have installed Python-2.7
and NiFi-1.9.1
.我在安装了Python-2.7
和NiFi-1.9.1
的Google Cloud Platform
上创建的虚拟机上运行这个NiFi
。 Below is my Pandas code:以下是我的熊猫代码:
import pandas as pd
readFile = pd.read_csv("/path",sep=" ",header=None)
readFile.columns = ['IP']
readFile['IP'] = readFile['IP'].replace(regex='((?<=[0-9])[0-9]|(?<=\.)[0-9])',value='X')
readFile.to_csv("/path", sep=' ')
I have below doubts:我有以下疑问:
1) Using getFile processor I am passing the file in the queue to the next processor ie ExecuteStreamCommand processor. 1) 使用 getFile 处理器,我将队列中的文件传递给下一个处理器,即 ExecuteStreamCommand 处理器。
2) Also, in my Python code, I am trying to read the data from the same input directory that was passed in the GetFile processor but now the file has been moved to the queue between getfile > executestreamcommand. 2) 此外,在我的 Python 代码中,我试图从传递给 GetFile 处理器的相同输入目录中读取数据,但现在文件已移至 getfile > executestreamcommand 之间的队列。 So how will it read it?那么它将如何读取呢?
3) After the python script is executed how can I use a putFile processor to save it back at some other place? 3) 执行 python 脚本后,如何使用 putFile 处理器将其保存回其他地方?
I am new to NiFi so trying to understand basic things.我是 NiFi 的新手,所以想了解基本的东西。 Also, I have attached the flow and error screenshot.另外,我附上了流程和错误截图。
The content of the flow file passed into a command (python in your case) as stdin
stream流文件的内容作为标准输入流传stdin
命令(在您的情况下为 python)
so, you have to use following code:所以,你必须使用下面的代码:
readFile = pd.read_json(sys.stdin)
on other hand if you need to apply regexp replace to the flow file, you could try to use ReplaceText processor instead of ExecuteStreamCommand
另一方面,如果您需要将正则表达式替换应用于流文件,您可以尝试使用ReplaceText处理器而不是ExecuteStreamCommand
You may need to provide the python.py file in the Volume where nifi has registered.您可能需要在 nifi 已注册的卷中提供 python.py 文件。
for example opt/nifi/nifi-current/ if its a docker image例如 opt/nifi/nifi-current/ 如果它是一个 docker 镜像
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.