简体   繁体   English

这是编写异步迭代器的正确方法吗?

[英]Is this the correct way to write an asynchronous iterator?

Hello: I wonder how to write an asynchronous table iterator. 您好:我想知道如何编写异步表迭代器。 Suppose the input table consists of many rows, and when the table is received, it is in serialized format. 假设输入表由许多行组成,并且在接收到该表时,该表为序列化格式。 When the table is received, the iterator is called to retrieve one row by one row. 收到表后,将调用迭代器以逐行检索一行。

It performs the reading and deserialization in the following way: 1) It first reads the integer about the size of the row and deserialize it. 它以以下方式执行读取和反序列化:1)首先读取有关行大小的整数并将其反序列化。 2) Then it reads and deserialize the contents of the row, in which, a. 2)然后,读取并反序列化行的内容,其中,a。 timestamp is first ready by calling in.readint(), b. 通过调用in.readint(),b首先准备好时间戳。 then each key of the row is read and deserialized, c. 然后读取并反序列化该行的每个键,c。 then the bitmap string about the non-key columns is read and deserialized. 然后,读取并反序列化有关非关键列的位图字符串。 d. d。 then calls in.readint() to read and deserialize the integer which represents the number of non-key columns, and then it reads and deserialize each non-key column. 然后调用in.readint()读取并反序列化表示非键列数的整数,然后读取并反序列化每个非键列。 3) Finally it reads and deserializes the file end marker, which indicates if the end of the file is reached. 3)最后,它读取并反序列化文件结尾标记,该标记指示是否到达文件结尾。

Finally it returns the deserialized row. 最后,它返回反序列化的行。

Here is the code 这是代码

enter code here
public Row next() {
/* It first reads the integer about the size of the row and 
deserialize it. */
int size = in.readInt();         
/*Then it reads and deserialize the contents of the row*/
Row row = Row.deserialize(descriptor, in);

/*Finally it reads and deserializes the file end marker, which 
indicates if the end of the file is reached.*/
int signal = in.readInt();
if (signal == FILE.END) {
    file_end = true;
    return row;
}
return row;
}

public Row deserialize(DataInput in) throws IOException {
/*timestamp is first ready by calling in.readint()*/
long timestamp= in.readLong();

Object[] Key = new Object[KeyColumns().size()];
Map<Column, Object> columns = new HashMap<>();

/*then each key of the row is read and deserialized */
int i = 0;
for (Column<?> col : KeyColumns()) {
    Key[i++] = col.type.deserialize(in);
}

/* then the bitmap string about the non-key columns is read and 
deserialized. */
int bitstring= in.readInt();

/*then calls in.readint() to read and deserialize the integer which
represents the number of non-key columns, and then it reads and 
deserialize each non-key column.*/

i = 0;
for (Column<?> col : rowColumns()) {
    if ((bitstring & (1 << i)) != 0){
    columns.put(col, col.type.deserialize(in));
}
    i++;
    }
    return new Row(timestamp, Key, columns);
}

To convert this iterator into an asynchronous iterator, I am thinking about using CompletableFuture in Java 8 and decoupling the read from deserialization. 为了将此迭代器转换为异步迭代器,我正在考虑在Java 8中使用CompletableFuture并将读取结果与反序列化脱钩。 That is, using a separate thend to handle the reading, like below 也就是说,使用单独的thend处理读数,如下所示

public Row next() {
CompletableFuture<Void> future = CompletableFuture.runAsync(() -> {
            int size= 0;
            try {
                size = in.readInt();
            } catch (IOException e) {
                e.printStackTrace();
            }
        });

        Row row = Row.deserialize(descriptor, in);
        int signal = in.readInt();

        if (signal == FILE.END) {
            file_end = true;
            return row;
        }
        return row;
}

But it seems to me that because the thread which does “size = in.readInt();” and the main thread which does “Row row = Row.deserialize(descriptor, in);” shares the same stream. 但是在我看来,因为执行“ size = in.readInt();”的线程和执行“ Row row = Row.deserialize(descriptor,in);”的主线程共享同一流。 They need to happen one after one. 他们需要一个接一个地发生。 Still no parallelism is achieved. 仍然没有实现并行性。 Any better way to implement this asynchronous iterator? 有没有更好的方法来实现此异步迭代器? Thanks. 谢谢。

First of all, you have a blocking resource ( DataInput ) at the heart. 首先,您的核心是阻塞资源( DataInput )。 So no matter what you do, you will have sync on reading the DataInput . 因此,无论您做什么,都将在读取DataInput保持同步。

In Java 8 I would definitely implement this with streams. 在Java 8中,我肯定会使用流来实现这一点。 See the following question: 请参阅以下问题:

How to implement a Java stream? 如何实现Java流?

The easiest would be to implement a Spliterator and create a stream with it using StreamSupport.stream(...) . 最简单的方法是使用StreamSupport.stream(...)实现一个Spliterator并使用它创建一个流。 In a Spliterator you will primarily only need to implement the tryAdvance method which is basically your "read next row" routine. Spliterator您主要只需要实现tryAdvance方法,该方法基本上是您的“读取下一行”例程。 There you'll need to synchronize reading from DataInput . 在那里,您需要同步从DataInput读取。

Once you have your Stream<Row> you will be able to apply different functions to it using map or forEach etc. 一旦有了Stream<Row> ,就可以使用mapforEach等对其应用不同的功能。

To achieve parallelism you'll need to implement the trySplit method in the Spliterator . 为了实现并行你需要实现trySplit的方法Spliterator Here comes the problem: if you can't read from your DataInput in parallel, splitting won't bring you much. 问题来了:如果您不能并行地从DataInput中读取数据,那么拆分将不会带来太多收益。 But still I think it would make sense creating a new instance of Spliterator for the same DataInput and synchronize them on reading. 但是我仍然认为为相同的DataInput创建一个Spliterator的新实例并在读取时使其同步是Spliterator的。 Reading will not be parallelized, but further processing may be (in a parallel stream). 读取不会并行化,但可以(在并行流中)进行进一步处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM