簡體   English   中英

如何在 Apache Beam Java SDK 中使用來自 GCS 的自定義 JDBC jar 文件

[英]How to use custom JDBC jar file from GCS in Apache Beam Java SDK

我有一個用例,即從 GCS 讀取文件並通過 Apache Beam 將其寫入我們自己的數據倉庫產品。 我們有一個自定義的 JDBC 驅動程序(.jar)來連接倉庫,我正在嘗試使用 Apache Beam 的 JdbcIO 來執行 ETL 和 maven-pom 來管理依賴項。 有人可以幫助我了解如何在 Apache Beam 中利用這個自定義 jar 文件嗎?


p.apply(JdbcIO.<KV<Integer, String>>read()
.withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create(
"MYDRIVERCLASS", "DATABASE_URL")
.withUsername("username")
.withPassword("password"))
.withQuery("select id,name from Person")
.withCoder(KvCoder.of(BigEndianIntegerCoder.of(), StringUtf8Coder.of()))
.withRowMapper(new JdbcIO.RowMapper<KV<Integer, String>>() {
public KV<Integer, String> mapRow(ResultSet resultSet) throws Exception {
    return KV.of(resultSet.getInt(1), resultSet.getString(2));
}
})
);

您可以在代碼中使用此示例代碼,以及如何使用它。

@Experimental(Experimental.Kind.SOURCE_SINK)
public class JdbcIO {
  /**
   * Read data from a JDBC datasource.
   *
   * @param  Type of the data to be read.
   */
  public static  Read read() {
    return new AutoValue_JdbcIO_Read.Builder().build();
  }

  /**
   * Like {@link #read}, but executes multiple instances of the query substituting each element
   * of a {@link PCollection} as query parameters.
   *
   * @param  Type of the data representing query parameters.
   * @param  Type of the data to be read.
   */
  public static  ReadAll readAll() {
    return new AutoValue_JdbcIO_ReadAll.Builder().build();
  }

  /**
   * Write data to a JDBC datasource.
   *
   * @param  Type of the data to be written.
   */
  public static  Write write() {
    return new AutoValue_JdbcIO_Write.Builder().build();
  }

  private JdbcIO() {}

  /**
   * An interface used by {@link JdbcIO.Read} for converting each row of the {@link ResultSet} into
   * an element of the resulting {@link PCollection}.
   */
  @FunctionalInterface
  public interface RowMapper extends Serializable {
    T mapRow(ResultSet resultSet) throws Exception;
  }

  /**
   * A POJO describing a {@link DataSource}, either providing directly a {@link DataSource} or all
   * properties allowing to create a {@link DataSource}.
   */
  @AutoValue
  public abstract static class DataSourceConfiguration implements Serializable {
    @Nullable abstract String getDriverClassName();
    @Nullable abstract String getUrl();
    @Nullable abstract String getUsername();
    @Nullable abstract String getPassword();
    @Nullable abstract String getConnectionProperties();
    @Nullable abstract DataSource getDataSource();

    abstract Builder builder();

    @AutoValue.Builder
    abstract static class Builder {
      abstract Builder setDriverClassName(String driverClassName);
      abstract Builder setUrl(String url);
      abstract Builder setUsername(String username);
      abstract Builder setPassword(String password);
      abstract Builder setConnectionProperties(String connectionProperties);
      abstract Builder setDataSource(DataSource dataSource);
      abstract DataSourceConfiguration build();
    }

    public static DataSourceConfiguration create(DataSource dataSource) {
      checkArgument(dataSource != null, "dataSource can not be null");
      checkArgument(dataSource instanceof Serializable, "dataSource must be Serializable");
      return new AutoValue_JdbcIO_DataSourceConfiguration.Builder()
          .setDataSource(dataSource)
          .build();
    }

    public static DataSourceConfiguration create(String driverClassName, String url) {
      checkArgument(driverClassName != null, "driverClassName can not be null");
      checkArgument(url != null, "url can not be null");
      return new AutoValue_JdbcIO_DataSourceConfiguration.Builder()
          .setDriverClassName(driverClassName)
          .setUrl(url)
          .build();
    }

    public DataSourceConfiguration withUsername(String username) {
      return builder().setUsername(username).build();
    }

    public DataSourceConfiguration withPassword(String password) {
      return builder().setPassword(password).build();
    }

    /**

您可以按照本示例構建和運行您的文件。 您可以查看更多文檔

# Build the project.
gradle('build')
 
# Check the generated build files.
run('ls -lh build/libs/')
 
# Run the shadow (fat jar) build.
gradle('runShadow')
 
# Sample the first 20 results, remember there are no ordering guarantees.
run('head -n 20 outputs/part-00000-of-*')

要使用其他依賴 jar,您可以在運行 Beam Java 管道時簡單地將此類 jar 添加到 CLASSPATH。 CLASSPATH 中的所有 jar 都應該由 Beam runner 進行上演。

您還可以使用PipelineOption 來指定依賴項。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM