简体   繁体   中英

Dataflow BigQuery read does not return correct datatype

In Apache Beam/Dataflow I am reading data into a collection using the following code:

  // read the BigQuery data
PCollection<TableRow> bigQuerySource = p
    .apply(BigQueryIO.readTableRows().fromQuery(bigQueryQuery).usingStandardSql().withTemplateCompatibility());

The query is "Select * from .." querying a view that queries other views and tables.

In the next transformation I use the following:

..
public void processElement(ProcessContext c) {
  Set<Map.Entry<String, Object>> entries = c.element().entrySet();
  for (Map.Entry<String, Object> entry : entries) {
    Object value = entry.getValue();
    String x = value.getClass().getName();
  }
..

The view contains multiple datatypes, String/Date/Integer/Boolean, but the returning datatype in x is only String/Boolean.

How can I get the original datatype from the BigQuery Schema?

If you get an instance of com.google.cloud.bigquery.BigQuery then you can get the types of your columns. For instance, to get the first column's type:

BigQuery bigQuery = BigQueryOptions.newBuilder()
                .setProjectId(projectId)
                .setCredentials(...)
                .build()
                .getService();

bigQuery.getTable(id).getDefinition().getSchema().getFields().get(0).getType()

This will give you LegacySQLTypeName . According to source code, this is what you can expect:

 /** Variable-length binary data. */
  public static final LegacySQLTypeName BYTES = type.createAndRegister("BYTES").setStandardType(StandardSQLTypeName.BYTES);
  /** Variable-length character (Unicode) data. */
  public static final LegacySQLTypeName STRING = type.createAndRegister("STRING").setStandardType(StandardSQLTypeName.STRING);
  /** A 64-bit signed integer value. */
  public static final LegacySQLTypeName INTEGER = type.createAndRegister("INTEGER").setStandardType(StandardSQLTypeName.INT64);
  /** A 64-bit IEEE binary floating-point value. */
  public static final LegacySQLTypeName FLOAT = type.createAndRegister("FLOAT").setStandardType(StandardSQLTypeName.FLOAT64);
  /** A Boolean value (true or false). */
  public static final LegacySQLTypeName BOOLEAN = type.createAndRegister("BOOLEAN").setStandardType(StandardSQLTypeName.BOOL);
  /** Represents an absolute point in time, with microsecond precision. */
  public static final LegacySQLTypeName TIMESTAMP = type.createAndRegister("TIMESTAMP").setStandardType(StandardSQLTypeName.TIMESTAMP);
  /** Represents a logical calendar date. Note, support for this type is limited in legacy SQL. */
  public static final LegacySQLTypeName DATE = type.createAndRegister("DATE").setStandardType(StandardSQLTypeName.DATE);
  /**
   * Represents a time, independent of a specific date, to microsecond precision. Note, support for
   * this type is limited in legacy SQL.
   */
  public static final LegacySQLTypeName TIME = type.createAndRegister("TIME").setStandardType(StandardSQLTypeName.TIME);
  /**
   * Represents a year, month, day, hour, minute, second, and subsecond (microsecond precision).
   * Note, support for this type is limited in legacy SQL.
   */
  public static final LegacySQLTypeName DATETIME = type.createAndRegister("DATETIME").setStandardType(StandardSQLTypeName.DATETIME);
  /** A record type with a nested schema. */
  public static final LegacySQLTypeName RECORD = type.createAndRegister("RECORD").setStandardType(StandardSQLTypeName.STRUCT);

I just found a similar issue here how can I get a bigquery table schema in java where the schema was returning null and they fixed it by calling the table.reload() first.

Schema schema = table.getDefinition().getSchema();

Also, you may check the corresponding class implementation and methods here: http://googlecloudplatform.github.io/google-cloud-java/google-cloud-clients/apidocs/?com/google/cloud/bigquery/package-summary.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM