Python / NiFi：ExecuteScript python，用於將UTF-16文本文件轉換為UTF-8

Question

我有ExecuteScript處理器，並且嘗試將所有通過的文件轉換為utf-8（如果最初是utf-16）。

迄今：

flowFileList = session.get(100)
if not flowFileList.isEmpty():
  for flowFile in flowFileList: 
     # Process each FlowFile here:
     flowFileList.decode("utf-16").encode("utf-8")

我覺得這應該是一個相當簡單的操作，如以下答案中所定義： here ， here和here 。

這將引發錯誤，“該對象在中沒有屬性'decode'。

如果這是一個愚蠢的問題，請隨意說。 謝謝

NiFi ExecuteScript 食譜：食譜

Answer 1

問題是您要在flowfileList對象而不是單個flowfile上調用decode 。

此外，您實際上需要訪問流文件內容，然后使用新的編碼設置內容。 現在，您將流文件對象視為字符串，但不是。 我不在我的電腦旁，但稍后會有可用的示例代碼。

更新

我將提供有效的Python代碼來演示這一點，但是為什么不能僅使用ConvertCharacterSet處理器呢？ 這接受輸入字符集和輸出字符集。

這是工作代碼，它將傳入的流文件內容從UTF-16轉換為UTF-8。 您應該嘗試過濾已經存在的UTF-8內容以跳過此處理器，或者添加代碼以對其進行識別並對其進行無操作處理。 您可能也有興趣關注NIFI-4550-為相同的行為添加InferCharacterSet處理器。

import java.io
from org.apache.commons.io import IOUtils
from java.nio.charset import StandardCharsets
from org.apache.nifi.processor.io import StreamCallback

# Define a subclass of StreamCallback for use in session.write()
class PyStreamCallback(StreamCallback):
    def __init__(self):
        pass
    def process(self, inputStream, outputStream):
        text = IOUtils.toString(inputStream, StandardCharsets.UTF_16)
        outputStream.write(bytearray(text.encode('utf-8')))
# end class

flowFileList = session.get(100)
if not flowFileList.isEmpty():
    for flowFile in flowFileList:
        flowFile = session.write(flowFile, PyStreamCallback())
        flowFile = session.putAttribute(flowFile, 'script_character_set', 'UTF-8')
        session.transfer(flowFile, REL_SUCCESS)
# implicit return at the end

Python / NiFi：ExecuteScript python，用於將UTF-16文本文件轉換為UTF-8

問題描述

1 個解決方案

解決方案1
3 已采納 2018-12-10 22:46:07

Python / NiFi：ExecuteScript python，用於將UTF-16文本文件轉換為UTF-8

問題描述

1 個解決方案

解決方案1 3 已采納 2018-12-10 22:46:07

解決方案1
3 已采納 2018-12-10 22:46:07