简体   繁体   中英

BigQuery nullable types in golang when using BigQuery storage write API

I'm switching from the legacy streaming API to the storage write API following this example in golang: https://github.com/alexflint/bigquery-storage-api-example

In the old code I used bigquery's null types to indicate a field can be null:

type Person struct {
    Name bigquery.NullString `bigquery:"name"`
    Age  bigquery.NullInt64  `bigquery:"age"`
}

var persons = []Person{
    {
        Name: ToBigqueryNullableString(""), // this will be null in bigquery
        Age:  ToBigqueryNullableInt64("20"),
    },
    {
        Name: ToBigqueryNullableString("David"),
        Age:  ToBigqueryNullableInt64("60"),
    },
}

func main() {
    ctx := context.Background()

    bigqueryClient, _ := bigquery.NewClient(ctx, "project-id")
    
    inserter := bigqueryClient.Dataset("dataset-id").Table("table-id").Inserter()
    err := inserter.Put(ctx, persons)
    if err != nil {
        log.Fatal(err)
    }
}

func ToBigqueryNullableString(x string) bigquery.NullString {
    if x == "" {
        return bigquery.NullString{Valid: false}
    }
    return bigquery.NullString{StringVal: x, Valid: true}
}
func ToBigqueryNullableInt64(x string) bigquery.NullInt64 {
    if x == "" {
        return bigquery.NullInt64{Valid: false}
    }
    if s, err := strconv.ParseInt(x, 10, 64); err == nil {
        return bigquery.NullInt64{Int64: s, Valid: true}
    }
    return bigquery.NullInt64{Valid: false}
}

After switching to the new API:

var persons = []*personpb.Row{
    {
        Name: "",
        Age: 20,
    },
    {
        Name: "David",
        Age: 60,
    },
}
func main() {
    ctx := context.Background()

    client, _ := storage.NewBigQueryWriteClient(ctx)
    defer client.Close()

    stream, err := client.AppendRows(ctx)
    if err != nil {
        log.Fatal("AppendRows: ", err)
    }

    var row personpb.Row
    descriptor, err := adapt.NormalizeDescriptor(row.ProtoReflect().Descriptor())
    if err != nil {
        log.Fatal("NormalizeDescriptor: ", err)
    }

    var opts proto.MarshalOptions
    var data [][]byte
    for _, row := range persons {
        buf, err := opts.Marshal(row)
        if err != nil {
            log.Fatal("protobuf.Marshal: ", err)
        }
        data = append(data, buf)
    }

    err = stream.Send(&storagepb.AppendRowsRequest{
        WriteStream: fmt.Sprintf("projects/%s/datasets/%s/tables/%s/streams/_default", "project-id", "dataset-id", "table-id"),
        Rows: &storagepb.AppendRowsRequest_ProtoRows{
            ProtoRows: &storagepb.AppendRowsRequest_ProtoData{
                WriterSchema: &storagepb.ProtoSchema{
                    ProtoDescriptor: descriptor,
                },
                Rows: &storagepb.ProtoRows{
                    SerializedRows: data,
                },
            },
        },
    })
    if err != nil {
        log.Fatal("AppendRows.Send: ", err)
    }

    _, err = stream.Recv()
    if err != nil {
        log.Fatal("AppendRows.Recv: ", err)
    }
}

With the new API I need to define the types in a.proto file, so I need to use something else to define nullable fields, I tried with optional fields:

syntax = "proto3";

package person;

option go_package = "/personpb";

message Row {
  optional string name = 1;
  int64 age = 2;
}

but it gives me error when trying to stream (not in the compile time): BqMessage.proto: person_Row.Name: The [proto3_optional=true] option may only be set on proto3fields, not person_Row.Name

Another option I tried is to use oneof , and write the proto file like this

syntax = "proto3";

import "google/protobuf/struct.proto";

package person;

option go_package = "/personpb";

message Row {
  NullableString name = 1;
  int64 age = 2;
}

message NullableString {
  oneof kind {
    google.protobuf.NullValue null = 1;
    string data = 2;
  }
}

Then use it like this:

var persons = []*personpb.Row{
    {
        Name: &personpb.NullableString{Kind: &personpb.NullableString_Null{
            Null: structpb.NullValue_NULL_VALUE,
        }},
        Age: 20,
    },
    {
        Name: &personpb.NullableString{Kind: &personpb.NullableString_Data{
            Data: "David",
        }},
        Age: 60,
    },
}
...

But this gives me the following error: Invalid proto schema: BqMessage.proto: person_Row.person_NullableString.null: FieldDescriptorProto.oneof_index 0 is out of range for type "person_NullableString".

I guess because the api doesn't know how to handle oneof type, I need to tell it somehow about this.

How can I use something like bigquery.Nullable types when using the new storage API? Any help will be appreciated

Take a look at this sample for an end to end example using a proto2 syntax file in go.

proto3 is still a bit of a special beast when working with the Storage API, for a couple reasons:

  • The current behavior of the Storage API is to operate using proto2 semantics .
  • Currently, the Storage API doesn't understand wrapper types, which was the original way in which proto3 was meant to communicate optional presence (eg NULL in BigQuery fields). Because of this, it tends to treat wrapper fields as a submessage with a value field (in BigQuery, a STRUCT with a single leaf field).
  • Later in its evolution, proto3 reintroduced the optional keyword as a way of marking presence, but in the internal representation this meant adding another presence marker (the source of the proto3_optional warning you were observing in the backend error).

It looks like you've using bits of the newer veneer, particularly adapt.NormalizeDescriptor() . I suspect if you're using this, you may be using an older version of the module, as the normalization code was updated in this PR and released as part of bigquery/v1.33.0 .

There's work to improve the experiences with the storage API and make the overall experience smoother, but there's still work to be done.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM