![](/img/trans.png)
[英]Generating multiple flowfiles using the Nifi ExecuteScript processor
[英]How to manipulate two csv flowfiles in ExecuteScript using Python?
在我的流程中,我查詢Hive,然后更新文件名,然后將這些csvs合並到一個具有多個電子表格的excel工作簿中。 我能夠使用此代碼將兩個csv文件合並到一個具有多個電子表格的excel工作簿中。 如何獲取腳本以使用nifi流中的兩個文件,而不是從PC上的目錄中提取文件? 我已經看到可以執行“ flowFile = session.get()”,但是這行捕獲了兩個flowfile嗎?
import glob
import csv
import xlwt
import os
import xlsxwriter
import datetime
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback
wb = xlsxwriter.Workbook("combined_at%s.xlsx" % datetime.datetime.now().strftime('%H-%M-%S'))
flowFile = session.get()
replacer = ",[]\"\"'\'"
worksheet = wb.add_worksheet("make")
worksheet2 = wb.add_worksheet("ownership")
worksheet3 = wb.add_worksheet("marital")
worksheet4 = wb.add_worksheet("drivers")
worksheet5 = wb.add_worksheet("vehicles")
worksheet6 = wb.add_worksheet("age")
worksheet7 = wb.add_worksheet("vyear")
def printHashedEmail(split_row, worksheet, index):
for y in replacer:
split_row[0] = split_row[0].replace(y, "")
worksheet.write(index, 0, split_row[0])
return;
def printOtherOnes(split_row, worksheet,index,non_changing_index):
for y in replacer:
split_row[non_changing_index] = split_row[non_changing_index].replace(y, "")
worksheet.write(index, 1, split_row[non_changing_index])
return;
with open("1.csv") as csv1:
i = 0
j = 0
for row in csv1:
split_row = row.split(",")
if split_row[2] != "":
printHashedEmail(split_row, worksheet, i)
printOtherOnes(split_row,worksheet,i,2)
i = i+1
if split_row[3].strip() != "":
printHashedEmail(split_row, worksheet2, j)
printOtherOnes(split_row, worksheet2, j, 3)
j = j+1
with open("2.csv") as csv1:
i = 0; k = 0; j = 0; l = 0;m = 0;
for row in csv1:
split_row = row.split(",")
if split_row[2] != "":
printHashedEmail(split_row, worksheet3, i)
printOtherOnes(split_row, worksheet3, i, 2)
i = i + 1
if split_row[3].strip() != "":
printHashedEmail(split_row, worksheet4, j)
printOtherOnes(split_row, worksheet4, j, 3)
j = j + 1
if split_row[5] != "":
printHashedEmail(split_row, worksheet5, l)
printOtherOnes(split_row, worksheet5, l, 5)
l = l + 1
if split_row[4].strip() != "":
printHashedEmail(split_row, worksheet6, k)
printOtherOnes(split_row, worksheet6, k, 4)
k = k + 1
if split_row[6].strip() != "":
printHashedEmail(split_row,worksheet7,m)
printOtherOnes(split_row, worksheet7, m, 6)
m = m + 1
wb.close()
print("Done")
操作后,我希望excel文件退出ExecuteScriptProcessor,以便我可以執行更多操作
檢查不同的session.get()
方法 。
例如session.get(2)
將嘗試從傳入隊列中獲取2個第一個文件。
如果只有一個,則可以調用session.rollback()
使其返回隊列。
但是這里的問題是隊列中的文件可能與您預期的順序不同。 試想一下,傳入隊列中有3個文件。
使用session.get(FlowFileFilter filter)
您可以從傳入隊列中選擇與某些屬性匹配的2個文件。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.