如何使用java中的Apache Beam直达写入BigTable?

[英]How to write to BigTable using Apache Beam direct-runner in java?

I have been trying to get Apache Beam direct runner to write to BigTable but it seems like there is a problem.我一直在尝试让 Apache Beam direct runner 写入 BigTable,但似乎有问题。

There is no failure or confirmation errors on the terminal when I run gradle run .当我运行gradle run时,终端上没有失败或确认错误。

My pipeline is as follows:我的管道如下:

Pub/Sub stream of messages -> direct-runner -> BigTable

Currently using org.apache.beam.sdk.io.gcp.bigtable.BigtableIO adapter which is not working or I am doing something wrong.当前使用org.apache.beam.sdk.io.gcp.bigtable.BigtableIO适配器不工作或我做错了什么。

There is also this another I/O adapter com.google.cloud.bigtable.beam.CloudBigtableIO and I am not sure which one to choose.还有另一个 I/O 适配器com.google.cloud.bigtable.beam.CloudBigtableIO ,我不确定选择哪个。

Some questions:一些问题:

  1. Which adapater should I use?我应该使用哪个适配器? -- this is answered, see edit. - 这是回答,见编辑。
  2. How do I verify what is going wrong in writing to BigTable?我如何验证写入 BigTable 时出了什么问题? Finding it hard to look into pipeline without those System.out.println statements.如果没有那些System.out.println语句,很难查看管道。
  3. How is authentication done to write to BigTable by direct-runner? direct-runner 写入 BigTable 的认证是如何完成的? Does the SDK automatically detect the $GOOGLE_APPLICATION_CREDENTIALS env variable & use those credentials? SDK 是否自动检测$GOOGLE_APPLICATION_CREDENTIALS环境变量并使用这些凭据?

Will be happy to give more details.很乐意提供更多详细信息。


To verify what is going on:要验证发生了什么:

1. add in main() BasicConfigurator.configure(); 1. 在main()中添加BasicConfigurator.configure();

2. add in pom.xml 2. 添加pom.xml


3. add this log4j.properities 3.添加这个log4j.properities

# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=DEBUG, A1

# A1 is set to be a ConsoleAppender.

# A1 uses PatternLayout.
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

To write on bigtable with direct runner 1. in pom.xml :使用直接运行器 1. 在pom.xml中写入 bigtable:


2. the interface for the pipeline option, to configure with run config commands: 2. pipeline选项的接口,使用run config命令进行配置:

public interface RequestsOptions extends PipelineOptions {
    @Description("File path")
    String getInput();

    void setInput(String value);

    String getOutput();

    void setOutput(String value);

3. in run config commands: 3.在运行配置命令:

--region=REGION_TO_RUN //if dataflow runner
--tempLocation=GOOGLE_STORAGE_LOCATION(to save temp files)

