[英]Converting ORC to JSON in Java
我正在尝试在单元测试中将 output ORC 文件转换为 Java 中的 JSON 。 我一直在阅读他们的单元测试并受到以下启发:
PrintStream origOut = System.out;
String outputFilename = "orc-file-dump.json";
String tmpFileLocationJson = createTempFileJson();
FileOutputStream myOut = new FileOutputStream(tmpFileLocationJson);
// replace stdout and run command
System.setOut(new PrintStream(myOut, true, StandardCharsets.UTF_8.toString()));
FileDump.main(new String[]{"data", tmpFileLocationJson});
System.out.flush();
System.setOut(origOut);
System.out.println("done");
像这样的东西。 问题是我不太确定如何将此代码等同于 java utils 利用率:
java -jar orc-tools-1.5.5-uber.jar data output-1595448128191.orc
为例,输出如下 Z0ECD11C1D7A287401FZD148A。
{"integerExample":1,"nestedExample":{"sub1":"value1","sub2":42},"dateExample":"2018-01-04"}
所以我想将 ORC 转换为 JSON 以便在我的单元测试中进行交叉引用。
Edit: This may be package private:( https://github.com/apache/orc/blob/b9e82b3d7b473201bdcf46011c3b2fda10ef897f/java/tools/src/java/org/apache/orc/tools/PrintData.java#L227
好的,我出售了 Hive 中的代码并将输出流覆盖到文件写入器,并将 output 重定向到文件中以读回测试。
static void printJsonData(String fileName, PrintStream printStream,
Reader reader) throws IOException, JSONException, org.codehaus.jettison.json.JSONException {
// OutputStreamWriter out = new OutputStreamWriter(printStream, "UTF-8");
BufferedWriter out = new BufferedWriter(new FileWriter(fileName.concat(".json")));
RecordReader rows = reader.rows();
try {
TypeDescription schema = reader.getSchema();
VectorizedRowBatch batch = schema.createRowBatch();
while (rows.nextBatch(batch)) {
for (int r = 0; r < batch.size; ++r) {
JSONWriter writer = new JSONWriter(out);
printRow(writer, batch, schema, r);
out.write("\n");
out.flush();
if (printStream.checkError()) {
throw new IOException("Error encountered when writing to stdout.");
}
}
}
} finally {
rows.close();
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.