简体   繁体   English

Apache Beam 未将无界数据保存到文本文件

[英]Apache Beam Not Saving Unbounded Data To Text File

I've created a Pipeline to save Google Cloud Pubsub messages into text files using Apache Beam and Java.我创建了一个管道,使用 Apache Beam 和 Java 将 Google Cloud Pubsub 消息保存到文本文件中。 Whenever I run the pipeline within Google Dataflow with --runner=DataflowRunner the messages are saved correctly.每当我使用--runner=DataflowRunner在 Google Dataflow 中运行管道时,消息都会正确保存。

However, when I run the same pipeline with --runner=DirerctRunner the messages are not saved.但是,当我使用--runner=DirerctRunner运行相同的管道时,不会保存消息。

I can watch the events coming through the pipeline, but nothing happens.我可以看到通过管道发生的事件,但没有任何反应。

The pipeline is the code below:管道是下面的代码:

public static void main(String[] args) {
    ExerciseOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(ExerciseOptions.class);

    Pipeline pipeline = Pipeline.create(options);

    pipeline
      .apply("Read Messages from Pubsub",
        PubsubIO
          .readStrings()
          .fromTopic(options.getTopicName()))

      .apply("Set event timestamp", ParDo.of(new DoFn<String, String>() {
        @ProcessElement
        public void processElement(ProcessContext context) {
          context.outputWithTimestamp(context.element(), Instant.now());
        }
      }))

      .apply("Windowing", Window.into(FixedWindows.of(Duration.standardMinutes(5))))

      .apply("Write to File",
        TextIO
          .write()
          .withWindowedWrites()
          .withNumShards(1)
          .to(options.getOutputPrefix()));

    pipeline.run();
  }

What I'm doing wrong?我做错了什么? Is it possible to run this pipeline locally?是否可以在本地运行此管道?

I was facing same problem as yours, while testing pipeline.在测试管道时,我遇到了与您相同的问题。 PubSubIO not working correctly with DirectRunner and TextIO . PubSubIO无法与DirectRunnerTextIO一起正常工作。

I found some kind of workaround for this issue with triggering.我通过触发找到了解决此问题的某种解决方法。

.apply(
                    "2 minutes window",
                    Window
                            .configure()
                            .triggering(
                                    Repeatedly.forever(
                                            AfterFirst.of(
                                                AfterPane.elementCountAtLeast(10),
                                                AfterProcessingTime
                                                        .pastFirstElementInPane()
                                                        .plusDelayOf(Duration.standardMinutes(2))
                                            )
                                    )
                            )
                            .into(
                                FixedWindows.of(
                                        Duration.standardMinutes(2)
                                )
                            )
            )

This way files are written as it should.这样文件就按原样写入。 Hope this will help someone.希望这会帮助某人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Apache beam 文本 IO 作者未将无限源写入文件 - Apache beam Text IO writer is not writing unbounded source to file Apache Beam - 使用无界PCollection进行集成测试 - Apache Beam - Integration test with unbounded PCollection Apache Beam - 在两个无界 PCollections 上按键进行流连接 - Apache Beam - Stream Join by Key on two unbounded PCollections Apache Beam Wait.on JdbcIO.write 与无限 PCollection 问题 - Apache Beam Wait.on JdbcIO.write with unbounded PCollection issue 如何修复“Apache Beam中仅加入具有触发器的非全局窗口”的“加入无界PCollections” - How to fix “Joining unbounded PCollections is currently only supported for non-global windows with triggers” in Apache Beam 将应用程序数据保存到文本文件 - Saving app data to text file Apache Beam管道从csv文件读取,拆分,groupbyKey并写入文本文件时出现“ IllegalStateException”错误。 为什么? - “IllegalStateException” error for Apache Beam pipeline to read from csv file, split, groupbyKey and write to text file. Why? 在Apache Beam中从GCS读取文件 - Read a file from GCS in Apache Beam Java Apache 光束中的 PcollectionView 阶段数据不流动 - Data is not flowing in PcollectionView stage in Java Apache beam 使用 XmlIo 读取 apache 光束中的 xml 文件 - Reading an xml file in apache beam using XmlIo
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM