[英]Apache Arrow getting vectors from Java in Python with zero copy
I use Apache Arrow libraries in java ( arrow-vector
, arrow-memory-unsafe
) and python ( pyarrow
) in different processes我在不同的进程中使用 java ( arrow-vector
, arrow-memory-unsafe
)和 python ( pyarrow
)中的Apache 箭头库
I try to implement in memory zero copy DataFrame, but I can't find appropriate API in java libraries to get memory address of arrow vectors from python. I try to implement in memory zero copy DataFrame, but I can't find appropriate API in java libraries to get memory address of arrow vectors from python. I have found that API in pyarrow
library, but not in java libraries.我在pyarrow
库中发现了 API,但在 java 库中没有。
What I need:我需要的:
VectorSchemaRoot
or field vectors in java获取 memory 地址或 VectorSchemaRoot 描述符或VectorSchemaRoot
中的字段向量pyarrow
将其传递给 python 库pyarrow
I have problem in the point 2我在第2点有问题
Do you know how can I do that?你知道我该怎么做吗? Thank you!谢谢!
There is the pyarrow.jvm
module for this.为此有pyarrow.jvm
模块。 The following code should be sufficient to turn a VectorSchemaRoot
into a RecordBatch
:以下代码足以将VectorSchemaRoot
转换为RecordBatch
:
import pyarrow.jvm
vs_root = <VectorSchemaRoot>
rb = pyarrow.jvm.record_batch(vs_root)
This is how it works if you have a Python reference to the Java VectorSchemaRoot
object, eg by using jpype
(see also https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html for a full use of that for JDBC). This is how it works if you have a Python reference to the Java VectorSchemaRoot
object, eg by using jpype
(see also https://uwekorn.com/2020/12/30/fast-jdbc-revisited.html for a full use of JDBC)。
If you use a different approach, you will need to iterate over the arrays of the VectorSchemaRoot
and then of the buffers of them to get the individual memory addresses of all buffers.如果您使用不同的方法,您将需要遍历 VectorSchemaRoot 的VectorSchemaRoot
和它们的缓冲区,以获取所有缓冲区的各个 memory 地址。 These can then be used to construct Buffer objects on the pyarrow
side and in return pyarrow.Array
instances.然后可以使用这些在pyarrow
端构造 Buffer 对象并返回pyarrow.Array
实例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.