[英]What is the correct way to read float16 data type column from arrow file in java?
I am trying to establish an IPC pipeline.我正在尝试建立 IPC 管道。 I have a python program which is writing an.arrow file and saving it in memory which gets picked up by the Java application.我有一个 python 程序,它正在编写一个.arrow 文件并将其保存在 memory 中,该程序被 Java 应用程序拾取。 This application reads the file schema and does relevant operations.此应用程序读取文件架构并执行相关操作。 Currently I am having trouble with a column that is of float16 datatype .目前我在使用float16 数据类型的列时遇到问题。 So my python program writes this column something like this:所以我的 python 程序将这个专栏写成这样:
# sample
float16column= pa.array(([np.float16(0) for _ in range(5)]), type=type_float16)
item_table = pa.table([float16column], ['samplefloat16columnname'])
local = fs.LocalFileSystem()
with local.open_output_stream("output.arrow") as file:
with pa.RecordBatchFileWriter(file, table.schema) as writer:
writer.write_table(table)
Now when I try to read this file from java application (and mind you this is the only column throwing error) using the below program现在,当我尝试使用以下程序从 java 应用程序中读取此文件时(请注意,这是唯一的列抛出错误)
public void read(String path) throws IOException {
File arrowFile = new File(path);
FileInputStream fileInputStream = new FileInputStream(arrowFile);
SeekableReadChannel seekableReadChannel = new SeekableReadChannel(fileInputStream.getChannel());
ArrowFileReader arrowFileReader = new ArrowFileReader(seekableReadChannel,
new RootAllocator(Integer.MAX_VALUE));
List<ArrowBlock> arrowBlocks = arrowFileReader.getRecordBlocks();
for (int i = 0; i < arrowBlocks.size(); i++) {
ArrowBlock rbBlock = arrowBlocks.get(i);
if (!arrowFileReader.loadRecordBatch(rbBlock)) { // load the batch
throw new IOException("Expected to read record batch");
}
// do something with the loaded batch
}
}
I see this error:我看到这个错误:
Exception in thread "main" java.lang.UnsupportedOperationException: NYI: FloatingPoint(HALF)
Now I am not very proficient in java, but I am guessing this may have something to do with incompatible data types of both.现在我对java不是很精通,但我猜这可能与两者的数据类型不兼容有关。 Does anyone else know the correct of way of doing this?有其他人知道这样做的正确方法吗?
ps: reading the same arrow file using python seems to be working fine ps:使用 python 读取相同的箭头文件似乎工作正常
The smallest floating point type supported in Java is float, which is 4 bytes and is represented in Arrow Java as float4. Java 中支持的最小浮点类型是 float,它是 4 个字节,在箭头 Java 中表示为 float4。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.