简体   繁体   English

如何在 Apache Beam / Google Dataflow 中使用 ParseJsons?

[英]How to use ParseJsons in Apache Beam / Google Dataflow?

java newbie here. java新手在这里。 I'm struggling to understand how to use ParseJsons in my Apache Beam pipeline to parse a string PCollection into an object PCollection.我正在努力理解如何在我的 Apache Beam 管道中使用ParseJsons将字符串 PCollection 解析为对象 PCollection。

My understanding is that I need to first define a class that matches the json structure, and then use ParseJsons to map the json strings into objects of that class.我的理解是需要先定义一个匹配json结构的类,然后使用ParseJsons将json字符串映射到那个类的对象中。

However, the ParseJsons documentation looks cryptic to me.然而,ParseJsons 文档对我来说看起来很神秘。 I'm not sure how to actually perform the transform using Apache Beam.我不确定如何使用 Apache Beam 实际执行转换。 Could someone give me a quick and dirty example of how to parse line delimited json strings?有人能给我一个关于如何解析行分隔的 json 字符串的快速而肮脏的例子吗?

Here's one of the attempts I've made, but unfortunately the syntax is incorrect.这是我所做的尝试之一,但不幸的是语法不正确。

class Product {
  private String name = null;
  private String url = null;
}

p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
 .apply(new ParseJsons.of(Product))
 .apply("WriteCounts", TextIO.write().to(options.getOutput()));

I think you want:我想你想要:

PCollectoion<Product> = 
  p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
   .apply(new ParseJsons.of(Product.class))
   .setCoder(SerializableCoder.of(MyPojo.class));

The ParseJsons.of method is static. ParseJsons.of方法是静态的。 So you can just call it without instantiating the class.所以你可以在不实例化类的情况下调用它。 However, you will need to convert the the result back to String.但是,您需要将结果转换回字符串。 Example:例子:

PCollection<MyPojo> = 
   p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
    .apply("Parse JSON", ParseJsons.of(MyPojo.class))
    .apply("Convert back to String", ParDo.of(new FormatPojoFn()))
    .apply("WriteCounts", TextIO.write().to(options.getOutput()));

You could also try using the writeCustomType method on the TextIO class :您还可以尝试在TextIO 类上使用writeCustomType方法:

p.apply(TextIO.<UserEvent>writeCustomType(new FormatEvent()).to(...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM