简体   繁体   English

使用 BigQuery 存储时 golang 中的 BigQuery 可为空类型写入 API

[英]BigQuery nullable types in golang when using BigQuery storage write API

I'm switching from the legacy streaming API to the storage write API following this example in golang: https://github.com/alexflint/bigquery-storage-api-example我正在从传统的流式传输 API切换到存储写入 API按照 golang 中的此示例: https://github.com/alexflint/bigquery-storage-example

In the old code I used bigquery's null types to indicate a field can be null:在旧代码中,我使用了 bigquery 的 null 类型来指示字段可以是 null:

type Person struct {
    Name bigquery.NullString `bigquery:"name"`
    Age  bigquery.NullInt64  `bigquery:"age"`
}

var persons = []Person{
    {
        Name: ToBigqueryNullableString(""), // this will be null in bigquery
        Age:  ToBigqueryNullableInt64("20"),
    },
    {
        Name: ToBigqueryNullableString("David"),
        Age:  ToBigqueryNullableInt64("60"),
    },
}

func main() {
    ctx := context.Background()

    bigqueryClient, _ := bigquery.NewClient(ctx, "project-id")
    
    inserter := bigqueryClient.Dataset("dataset-id").Table("table-id").Inserter()
    err := inserter.Put(ctx, persons)
    if err != nil {
        log.Fatal(err)
    }
}

func ToBigqueryNullableString(x string) bigquery.NullString {
    if x == "" {
        return bigquery.NullString{Valid: false}
    }
    return bigquery.NullString{StringVal: x, Valid: true}
}
func ToBigqueryNullableInt64(x string) bigquery.NullInt64 {
    if x == "" {
        return bigquery.NullInt64{Valid: false}
    }
    if s, err := strconv.ParseInt(x, 10, 64); err == nil {
        return bigquery.NullInt64{Int64: s, Valid: true}
    }
    return bigquery.NullInt64{Valid: false}
}

After switching to the new API:切换到新的 API 后:

var persons = []*personpb.Row{
    {
        Name: "",
        Age: 20,
    },
    {
        Name: "David",
        Age: 60,
    },
}
func main() {
    ctx := context.Background()

    client, _ := storage.NewBigQueryWriteClient(ctx)
    defer client.Close()

    stream, err := client.AppendRows(ctx)
    if err != nil {
        log.Fatal("AppendRows: ", err)
    }

    var row personpb.Row
    descriptor, err := adapt.NormalizeDescriptor(row.ProtoReflect().Descriptor())
    if err != nil {
        log.Fatal("NormalizeDescriptor: ", err)
    }

    var opts proto.MarshalOptions
    var data [][]byte
    for _, row := range persons {
        buf, err := opts.Marshal(row)
        if err != nil {
            log.Fatal("protobuf.Marshal: ", err)
        }
        data = append(data, buf)
    }

    err = stream.Send(&storagepb.AppendRowsRequest{
        WriteStream: fmt.Sprintf("projects/%s/datasets/%s/tables/%s/streams/_default", "project-id", "dataset-id", "table-id"),
        Rows: &storagepb.AppendRowsRequest_ProtoRows{
            ProtoRows: &storagepb.AppendRowsRequest_ProtoData{
                WriterSchema: &storagepb.ProtoSchema{
                    ProtoDescriptor: descriptor,
                },
                Rows: &storagepb.ProtoRows{
                    SerializedRows: data,
                },
            },
        },
    })
    if err != nil {
        log.Fatal("AppendRows.Send: ", err)
    }

    _, err = stream.Recv()
    if err != nil {
        log.Fatal("AppendRows.Recv: ", err)
    }
}

With the new API I need to define the types in a.proto file, so I need to use something else to define nullable fields, I tried with optional fields:使用新的 API 我需要在 a.proto 文件中定义类型,所以我需要使用其他东西来定义可空字段,我尝试使用可选字段:

syntax = "proto3";

package person;

option go_package = "/personpb";

message Row {
  optional string name = 1;
  int64 age = 2;
}

but it gives me error when trying to stream (not in the compile time): BqMessage.proto: person_Row.Name: The [proto3_optional=true] option may only be set on proto3fields, not person_Row.Name但它在尝试 stream (不在编译时)时给了我错误: BqMessage.proto: person_Row.Name: The [proto3_optional=true] option may only be set on proto3fields, not person_Row.Name

Another option I tried is to use oneof , and write the proto file like this我尝试的另一个选项是使用oneof ,并像这样编写 proto 文件

syntax = "proto3";

import "google/protobuf/struct.proto";

package person;

option go_package = "/personpb";

message Row {
  NullableString name = 1;
  int64 age = 2;
}

message NullableString {
  oneof kind {
    google.protobuf.NullValue null = 1;
    string data = 2;
  }
}

Then use it like this:然后像这样使用它:

var persons = []*personpb.Row{
    {
        Name: &personpb.NullableString{Kind: &personpb.NullableString_Null{
            Null: structpb.NullValue_NULL_VALUE,
        }},
        Age: 20,
    },
    {
        Name: &personpb.NullableString{Kind: &personpb.NullableString_Data{
            Data: "David",
        }},
        Age: 60,
    },
}
...

But this gives me the following error: Invalid proto schema: BqMessage.proto: person_Row.person_NullableString.null: FieldDescriptorProto.oneof_index 0 is out of range for type "person_NullableString".但这给了我以下错误: Invalid proto schema: BqMessage.proto: person_Row.person_NullableString.null: FieldDescriptorProto.oneof_index 0 is out of range for type "person_NullableString".

I guess because the api doesn't know how to handle oneof type, I need to tell it somehow about this.我猜是因为 api 不知道如何处理 oneof 类型,所以我需要以某种方式告诉它。

How can I use something like bigquery.Nullable types when using the new storage API?使用新存储 API 时,如何使用bigquery.Nullable类型? Any help will be appreciated任何帮助将不胜感激

Take a look at this sample for an end to end example using a proto2 syntax file in go.查看此示例,了解使用 go 中的 proto2 语法文件的端到端示例。

proto3 is still a bit of a special beast when working with the Storage API, for a couple reasons:在使用存储 API 时,proto3 仍然有点特殊,原因如下:

  • The current behavior of the Storage API is to operate using proto2 semantics .存储 API 的当前行为是使用 proto2 语义进行操作
  • Currently, the Storage API doesn't understand wrapper types, which was the original way in which proto3 was meant to communicate optional presence (eg NULL in BigQuery fields).目前,存储 API 不理解包装器类型,这是 proto3 用于传达可选存在的原始方式(例如,BigQuery 字段中的 NULL)。 Because of this, it tends to treat wrapper fields as a submessage with a value field (in BigQuery, a STRUCT with a single leaf field).因此,它倾向于将包装器字段视为具有值字段的子消息(在 BigQuery 中,是具有单个叶字段的 STRUCT)。
  • Later in its evolution, proto3 reintroduced the optional keyword as a way of marking presence, but in the internal representation this meant adding another presence marker (the source of the proto3_optional warning you were observing in the backend error).在其演变的后期,proto3 重新引入了optional关键字作为标记存在的一种方式,但在内部表示中,这意味着添加另一个存在标记(您在后端错误中观察到的proto3_optional警告的来源)。

It looks like you've using bits of the newer veneer, particularly adapt.NormalizeDescriptor() .看起来您使用了一些较新的单板,特别是adapt.NormalizeDescriptor() I suspect if you're using this, you may be using an older version of the module, as the normalization code was updated in this PR and released as part of bigquery/v1.33.0 .我怀疑如果你使用这个,你可能使用的是旧版本的模块,因为规范化代码在这个 PR中更新并作为bigquery/v1.33.0的一部分发布。

There's work to improve the experiences with the storage API and make the overall experience smoother, but there's still work to be done.有工作可以改善存储 API 的体验并使整体体验更流畅,但仍有工作要做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用没有偏移量的存储写入 API 插入 BigQuery 时如何进行行重复数据删除? - How to have row deduplication when inserting into BigQuery using the Storage Write API without offsets? 使用 BigQuery 存储写入 API 的 Google 数据流存储到特定分区 - Google Dataflow store to specific Partition using BigQuery Storage Write API BigQuery Storage API:追加/写入操作的原子性 - BigQuery Storage API: Atomicity of an append/write operation 跨项目使用 BigQuery Storage 阅读 API 时谁付费? - Who pays when using BigQuery Storage Read API across projects? 使用 Golang 读取 Google Cloud Pubsub 消息并写入 BigQuery - Read Google Cloud Pubsub message and write to BigQuery using Golang Google Bigquery Storage 写入 API apache 波束触发频率 - Google Bigquery Storage Write API apache beam triggering frequency Python BigQuery Storage 写入默认时写入重试策略 stream - Python BigQuery Storage Write retry strategy when writing to default stream BigQuery 存储写 / managedwriter api 返回错误 server_shutting_down - BigQuery Storage Write / managedwriter api return error server_shutting_down 避免 session 关闭 BigQuery 存储 API 与 Dataflow - Avoid session shutdown on BigQuery Storage API with Dataflow BigQuery Storage Read API with Limit and Offset - BigQuery Storage Read API with Limit and Offset
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM