简体   繁体   English

如何使用java中的Apache Beam直达写入BigTable?

[英]How to write to BigTable using Apache Beam direct-runner in java?

I have been trying to get Apache Beam direct runner to write to BigTable but it seems like there is a problem.我一直在尝试让 Apache Beam direct runner 写入 BigTable,但似乎有问题。

There is no failure or confirmation errors on the terminal when I run gradle run .当我运行gradle run时,终端上没有失败或确认错误。

My pipeline is as follows:我的管道如下:

Pub/Sub stream of messages -> direct-runner -> BigTable

Currently using org.apache.beam.sdk.io.gcp.bigtable.BigtableIO adapter which is not working or I am doing something wrong.当前使用org.apache.beam.sdk.io.gcp.bigtable.BigtableIO适配器不工作或我做错了什么。

There is also this another I/O adapter com.google.cloud.bigtable.beam.CloudBigtableIO and I am not sure which one to choose.还有另一个 I/O 适配器com.google.cloud.bigtable.beam.CloudBigtableIO ,我不确定选择哪个。

Some questions:一些问题:

  1. Which adapater should I use?我应该使用哪个适配器? -- this is answered, see edit. - 这是回答,见编辑。
  2. How do I verify what is going wrong in writing to BigTable?我如何验证写入 BigTable 时出了什么问题? Finding it hard to look into pipeline without those System.out.println statements.如果没有那些System.out.println语句,很难查看管道。
  3. How is authentication done to write to BigTable by direct-runner? direct-runner 写入 BigTable 的认证是如何完成的? Does the SDK automatically detect the $GOOGLE_APPLICATION_CREDENTIALS env variable & use those credentials? SDK 是否自动检测$GOOGLE_APPLICATION_CREDENTIALS环境变量并使用这些凭据?

Will be happy to give more details.很乐意提供更多详细信息。

EDIT:编辑:

To verify what is going on:要验证发生了什么:

1. add in main() BasicConfigurator.configure(); 1. 在main()中添加BasicConfigurator.configure();

2. add in pom.xml 2. 添加pom.xml

<dependency>
     <groupId>org.slf4j</groupId>
     <artifactId>slf4j-log4j12</artifactId>
     <version>1.7.32</version>
</dependency>

3. add this log4j.properities 3.添加这个log4j.properities

# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=DEBUG, A1

# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender

# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

To write on bigtable with direct runner 1. in pom.xml :使用直接运行器 1. 在pom.xml中写入 bigtable:

<dependency>
      <groupId>org.apache.beam</groupId>
      <artifactId>beam-runners-direct-java</artifactId>
      <version>${beam.version}</version>
      <scope>runtime</scope>
</dependency>

2. the interface for the pipeline option, to configure with run config commands: 2. pipeline选项的接口,使用run config命令进行配置:

public interface RequestsOptions extends PipelineOptions {
    @Description("File path")
    @Validation.Required
    String getInput();

    void setInput(String value);

    @Description("Output")
    @Validation.Required
    String getOutput();

    void setOutput(String value);
}

3. in run config commands: 3.在运行配置命令:

--project=PROJECT_ID
--dataset=DATASET_NAME
--inputFile=INPUT_FILE_NAME
--region=REGION_TO_RUN //if dataflow runner
--runner=YOUR_SELECTED_RUNNER
--tempLocation=GOOGLE_STORAGE_LOCATION(to save temp files)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 流式 pubsub -bigtable 使用 apache 光束数据流 java - Streaming pubsub -bigtable using apache beam dataflow java Apache Beam 使用 Go 写入 PubSub 消息 - Apache Beam Write PubSub messages using Go 如何转换 PCollection<row> 使用 Java 到数据流 Apache 中的 Integer</row> - How to convert PCollection<Row> to Integer in Dataflow Apache beam using Java Apache Beam Java(SDK 版本 2.43.0 - 2.44.0)批量加载到 BigQuery 失败使用存储写入 API - Apache Beam Java (SDK versions 2.43.0 - 2.44.0) batch loads to BigQuery fail using Storage Write API Apache Beam DataFlow Runner 在启动期间写入数据存储/强制节流时抛出错误 - Apache Beam DataFlow Runner throwing error for Write to Data-store/Enforce throttling during ramp-up 如何在 Apache Beam Java 中写入带有动态标头的 CSV 文件 - How do I write CSV file with dynamic headers in Apache Beam Java 如何使用 python 处理我的 PubSub 消息 Object 并将所有对象写入 Apache Beam 中的 BigQuery? - How to Process my PubSub Message Object and Write all objects into BigQuery in Apache Beam using python? 将 Apache Beam Tagged Output(数据流运行器)写入不同的 BQ 表 - Writing Apache Beam Tagged Output (Dataflow runner) to different BQ tables Apache Beam 管道写入多个 BQ 表 - Apache Beam Pipeline Write to Multiple BQ tables Dataflow (Apache Beam) 无法写入 BigQuery - Dataflow (Apache Beam) can't write on BigQuery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM