在apache beam中使用SpannerIO时出错

Question

This question is a follow-up to this one . 这个问题是一个跟进到这一个。 I am trying to use apache beam to read data from a google spanner table (and then do some data processing). 我正在尝试使用apache beam从google spanner表中读取数据（然后进行一些数据处理）。 I wrote the following minimum example using the java SDK: 我使用java SDK编写了以下最小示例：

package com.google.cloud.dataflow.examples;
import java.io.IOException;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.PipelineResult;
import org.apache.beam.sdk.io.gcp.spanner.SpannerIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.values.PCollection;
import com.google.cloud.spanner.Struct;

public class backup {

  public static void main(String[] args) throws IOException {
    PipelineOptions options = PipelineOptionsFactory.create();

    Pipeline p = Pipeline.create(options);
    PCollection<Struct> rows = p.apply(
            SpannerIO.read()
                .withInstanceId("my_instance")
                .withDatabaseId("my_db")
                .withQuery("SELECT t.table_name FROM information_schema.tables AS t")
                );

    PipelineResult result = p.run();
    try {
      result.waitUntilFinish();
    } catch (Exception exc) {
      result.cancel();
    }
  }
}

When I try to execute the code using the DirectRunner, I get the following error message: 当我尝试使用DirectRunner执行代码时，我收到以下错误消息：

org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException: org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException：

org.apache.beam.sdk.util.UserCodeException: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor org.apache.beam.sdk.util.UserCodeException：java.lang.NoClassDefFoundError：无法初始化类com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

[...] Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor [...]引起：org.apache.beam.sdk.util.UserCodeException：java.lang.NoClassDefFoundError：无法初始化类com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

[...] Caused by: java.lang.NoClassDefFoundError: Could not initialize class com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor [...]引起：java.lang.NoClassDefFoundError：无法初始化类com.google.cloud.spanner.spi.v1.SpannerErrorInterceptor

Or, using the DataflowRunner: 或者，使用DataflowRunner：

org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException: org.apache.beam.sdk.util.UserCodeException: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable org.apache.beam.runners.direct.repackaged.com.google.common.util.concurrent.UncheckedExecutionException：org.apache.beam.sdk.util.UserCodeException：java.lang.NoSuchFieldError：internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

[...] Caused by: org.apache.beam.sdk.util.UserCodeException: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable [...]引起：org.apache.beam.sdk.util.UserCodeException：java.lang.NoSuchFieldError：internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

[...] Caused by: java.lang.NoSuchFieldError: internal_static_google_rpc_LocalizedMessage_fieldAccessorTable [...]引起：java.lang.NoSuchFieldError：internal_static_google_rpc_LocalizedMessage_fieldAccessorTable

In both cases, the error message is rather cryptic, and I could not find any clear ideas as to what causes the error from a google search. 在这两种情况下，错误信息都相当神秘，我找不到任何明确的想法，因为谷歌搜索的错误是什么原因。 I also could not find any example scripts using the SpannerIO module. 我也找不到使用SpannerIO模块的任何示例脚本。

Is this error due to an obvious error in my code, or is it due to a bad installation of the google cloud tools ? 这个错误是由于我的代码中的明显错误，还是由于Google云工具安装不当造成的？

Answer 1

This issue is most likely caused by a dependency compatibility problem described here: BEAM-2837 . 此问题很可能是由此处描述的依赖性兼容性问题引起的： BEAM-2837 。 Here's a quick workaround described in one of the comments in the JIRA issue: 以下是JIRA问题中的一条评论中描述的快速解决方法：

<dependency>
    <groupId>com.google.api.grpc</groupId>
    <artifactId>grpc-google-common-protos</artifactId>
    <version>0.1.9</version>
</dependency>

<dependency>
    <groupId>org.apache.beam</groupId>
    <artifactId>beam-sdks-java-io-google-cloud-platform</artifactId>
    <version>${beam.version}</version>
    <exclusions>
        <exclusion>
            <groupId>com.google.api.grpc</groupId>
            <artifactId>grpc-google-common-protos</artifactId>
        </exclusion>
    </exclusions>
</dependency>

Explicitly define the required com.google.api.grpc dependency and exclude the version from org.apache.beam . 明确定义所需的com.google.api.grpc依赖关系并从org.apache.beam排除该版本。

Answer 2

You need to specify the ProjectID: 您需要指定ProjectID：

    SpannerIO.read()
            .withProjectId("my_project")
            .withInstanceId("my_instance")
            .withDatabaseId("my_db")

And you need to set the credentials for your Spanner project. 您需要为Spanner项目设置凭据。 As the API of SpannerIO does not allow you to set any custom credentials, you must set Global Application Credentials using the environment variable GOOGLE_APPLICATION_CREDENTIALS. 由于SpannerIO的API不允许您设置任何自定义凭据，因此必须使用环境变量GOOGLE_APPLICATION_CREDENTIALS设置全局应用程序凭据。

You could also read (and write) to Cloud Spanner using JDBC. 您还可以使用JDBC读取（并写入）Cloud Spanner。 Reading is done like this: 阅读是这样完成的：

        PCollection<KV<String, Long>> words = p2.apply(JdbcIO.<KV<String, Long>> read()
            .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("nl.topicus.jdbc.CloudSpannerDriver",
                    "jdbc:cloudspanner://localhost;Project=my-project-id;Instance=instance-id;Database=database;PvtKeyPath=C:\\Users\\MyUserName\\Documents\\CloudSpannerKeys\\cloudspanner-key.json"))
            .withQuery("SELECT t.table_name FROM information_schema.tables AS t").withCoder(KvCoder.of(StringUtf8Coder.of(), BigEndianLongCoder.of()))
            .withRowMapper(new JdbcIO.RowMapper<KV<String, Long>>()
            {
                private static final long serialVersionUID = 1L;

                @Override
                public KV<String, Long> mapRow(ResultSet resultSet) throws Exception
                {
                    return KV.of(resultSet.getString(1), resultSet.getLong(2));
                }
            }));

This method also allows you to use custom credentials by setting the PvtKeyPath. 此方法还允许您通过设置PvtKeyPath来使用自定义凭据。 You can also write to Google Cloud Spanner using JDBC. 您也可以使用JDBC写入Google Cloud Spanner。 Have a look here for an example: http://www.googlecloudspanner.com/2017/10/google-cloud-spanner-with-apache-beam.html 看看这里的例子： http ： //www.googlecloudspanner.com/2017/10/google-cloud-spanner-with-apache-beam.html

在apache beam中使用SpannerIO时出错

问题描述

2 个解决方案

解决方案1
5 2018-02-16 23:41:59

解决方案2
1 2017-10-11 10:12:44

在apache beam中使用SpannerIO时出错

问题描述

2 个解决方案

解决方案1 5 2018-02-16 23:41:59

解决方案2 1 2017-10-11 10:12:44

解决方案1
5 2018-02-16 23:41:59

解决方案2
1 2017-10-11 10:12:44