简体   繁体   English

Apache Flink 将 DataStream(源)转换为 List?

[英]Apache Flink transform DataStream (source) to a List?

My question is how to transform from a DataStream to a List , for example in order to be able to iterate through it.我的问题是如何从DataStream转换为List ,例如为了能够遍历它。

The code looks like:代码如下所示:

package flinkoracle;

//imports
//....

public class FlinkOracle {

    final static Logger LOG = LoggerFactory.getLogger(FlinkOracle.class);

    public static void main(String[] args) {
        LOG.info("Starting...");
        // get the execution environment
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        TypeInformation[] fieldTypes = new TypeInformation[]{BasicTypeInfo.STRING_TYPE_INFO,
            BasicTypeInfo.STRING_TYPE_INFO,
            BasicTypeInfo.STRING_TYPE_INFO,
            BasicTypeInfo.STRING_TYPE_INFO};

        RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes);
        //get the source from Oracle DB
        DataStream<?> source = env
                .createInput(JDBCInputFormat.buildJDBCInputFormat()
                        .setDrivername("oracle.jdbc.driver.OracleDriver")
                        .setDBUrl("jdbc:oracle:thin:@localhost:1521")
                        .setUsername("user")
                        .setPassword("password")
                        .setQuery("select * from  table1")
                        .setRowTypeInfo(rowTypeInfo)
                        .finish());

        source.print().setParallelism(1);

        try {
            LOG.info("----------BEGIN----------");
            env.execute();
            LOG.info("----------END----------");
        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        LOG.info("End...");
    }

}

Thanks a lot in advance.提前非常感谢。 Br Tamas兄弟塔马斯

Flink provides an iterator sink to collect DataStream results for testing and debugging purposes. Flink提供了一个迭代器接收器来收集DataStream结果以进行测试和调试。 It can be used as follows: 可以如下使用:

import org.apache.flink.contrib.streaming.DataStreamUtils;

DataStream<Tuple2<String, Integer>> myResult = ...
Iterator<Tuple2<String, Integer>> myOutput = DataStreamUtils.collect(myResult)

You can copy an iterator to a new list like this: 您可以将迭代器复制到新列表,如下所示:

while (iter.hasNext())
    list.add(iter.next());

Flink also provides a bunch of simple write*() methods on DataStream that are mainly intended for debugging purposes. Flink在DataStream上还提供了一堆简单的write *()方法,这些方法主要用于调试目的。 The data flushing to the target system depends on the implementation of the OutputFormat. 刷新到目标系统的数据取决于OutputFormat的实现。 This means that not all elements sent to the OutputFormat are immediately shown up in the target system. 这意味着并非所有发送到OutputFormat的元素都会立即显示在目标系统中。 Note: These write*() methods do not participate in Flink's checkpointing, and in failure cases, those records might be lost. 注意:这些write *()方法不参与Flink的检查点,并且在失败的情况下,这些记录可能会丢失。

writeAsText() / TextOutputFormat
writeAsCsv(...) / CsvOutputFormat
print() / printToErr()
writeUsingOutputFormat() / FileOutputFormat
writeToSocket

Source: link . 来源: link

You may need to add the following dependency to use DataStreamUtils: 您可能需要添加以下依赖项才能使用DataStreamUtils:

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-streaming-contrib</artifactId>
    <version>0.10.2</version>
</dependency>

In newer versions, DataStreamUtils::collect has been deprecated.在较新的版本中, DataStreamUtils::collect已被弃用。 Instead you can use DataStream::executeAndCollect which, if given a limit, will return a List of at most that size.相反,您可以使用DataStream::executeAndCollect如果给定一个限制,它将返回一个最多该大小的List

var list = source.executeAndCollect(100);

If you do not know how many elements there are, or if you simply want to iterate through the results without loading them all into memory at once, then you can use the no-arg version to get a to get a CloseableIterator如果您不知道有多少元素,或者您只是想遍历结果而不一次将它们全部加载到 memory 中,那么您可以使用无参数版本来获取 a 以获取CloseableIterator

try (var iterator = source.executeAndCollect()) {
  // do something
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM