[英]How to use ParseJsons in Apache Beam / Google Dataflow?
java newbie here. java新手在这里。 I'm struggling to understand how to use ParseJsons in my Apache Beam pipeline to parse a string PCollection into an object PCollection.我正在努力理解如何在我的 Apache Beam 管道中使用ParseJsons将字符串 PCollection 解析为对象 PCollection。
My understanding is that I need to first define a class that matches the json structure, and then use ParseJsons to map the json strings into objects of that class.我的理解是需要先定义一个匹配json结构的类,然后使用ParseJsons将json字符串映射到那个类的对象中。
However, the ParseJsons documentation looks cryptic to me.然而,ParseJsons 文档对我来说看起来很神秘。 I'm not sure how to actually perform the transform using Apache Beam.我不确定如何使用 Apache Beam 实际执行转换。 Could someone give me a quick and dirty example of how to parse line delimited json strings?有人能给我一个关于如何解析行分隔的 json 字符串的快速而肮脏的例子吗?
Here's one of the attempts I've made, but unfortunately the syntax is incorrect.这是我所做的尝试之一,但不幸的是语法不正确。
class Product {
private String name = null;
private String url = null;
}
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply(new ParseJsons.of(Product))
.apply("WriteCounts", TextIO.write().to(options.getOutput()));
I think you want:我想你想要:
PCollectoion<Product> =
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply(new ParseJsons.of(Product.class))
.setCoder(SerializableCoder.of(MyPojo.class));
The ParseJsons.of
method is static. ParseJsons.of
方法是静态的。 So you can just call it without instantiating the class.所以你可以在不实例化类的情况下调用它。 However, you will need to convert the the result back to String.但是,您需要将结果转换回字符串。 Example:例子:
PCollection<MyPojo> =
p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
.apply("Parse JSON", ParseJsons.of(MyPojo.class))
.apply("Convert back to String", ParDo.of(new FormatPojoFn()))
.apply("WriteCounts", TextIO.write().to(options.getOutput()));
You could also try using the writeCustomType
method on the TextIO class :您还可以尝试在TextIO 类上使用writeCustomType
方法:
p.apply(TextIO.<UserEvent>writeCustomType(new FormatEvent()).to(...)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.