简体   繁体   中英

How do I use an AutoValue data type for my PCollection in Apache Beam?

I would like to use my AutoValue data classes as object types in my PCollection, but I'm having trouble using an automatic coder for it:

@AutoValue
public abstract class MyPersonClass {
  public abstract String getName();
  public abstract Integer getAge();
  public abstract Float getHeight();

  public static MyPersonClass create(String name, Integer age, Float height) {
    return new AutoValue_MyPersonClass(name, age, height);
  }
}

Whenever I use this, I get errors from Beam trying to choose a coder. I do not want to define my own coder for it.

How can I use a coder that infers the schema of my AutoValue class? Or can a different coder be automatically inferred for it?

Beam has a utility to automatically infer schemas for different data classes, including Java Beans, Beans with Getters and Setters, Avro Records, Protocol buffers, and AutoValue classes.

You just need to add the DefaultSchema annotation with the appropriate SchemaProvider ( see the SchemaProvider javadoc and discover subclasses there ).

This annotation works well with AutoValue builders, so nothing else is needed if you are using an AutoValue.Builder pattern !

If you are using a create function instead, like in this case, you can add the SchemaCreate annotation, like so:

import org.apache.beam.sdk.schemas.AutoValueSchema;
import org.apache.beam.sdk.schemas.annotations.DefaultSchema;
import org.apache.beam.sdk.schemas.annotations.SchemaCreate;

@DefaultSchema(AutoValueSchema.class)
@AutoValue
public abstract class MyPersonClass {
  public abstract String getName();
  public abstract Integer getAge();
  public abstract Float getHeight();

  @SchemaCreate
  public static MyPersonClass create(String name, Integer age, Float height) {
    return new AutoValue_MyPersonClass(name, age, height);
  }
}

Finally, if you cannot modify the class yourself (possibly because you don't own the source code containing the AutoValue class), you can manually register it as follows:

pipeline.getSchemaRegistry().registerSchemaProvider(
    MyPersonClass.class, new AutoValueSchema());

The accepted answer is excellent.

My 2 cents, with a constraint that exists in the AutoValueSchema, ReflectionUtils#isGtter, it expects that the AutoValue's fields follow the get* convention. If you follow the convention of naming your getters as field() instead of getField(), the AutoValueSchema won't register those as the actual getter method(s), and thereby not as the property to be used for schema creation. (The last bit is somewhat hazy to me, as I am not sure of the complete flow of identification of a property via getter, have to read the source a bit more in detail).

So, as of now, you'd have to name all your AutoValue getters as get*() in order for it to work with Beam's AutoValueSchema properly.

See: https://github.com/apache/beam/pull/7334 andhttps://github.com/apache/beam/pull/7334#issuecomment-453560743 for further details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM