简体   繁体   中英

BigQuery Python library loading datetime inconsistently

I am using the Python BigQuery library ( google-cloud-bigquery==3.3.0 ) to write data into BigQuery from a Pandas dataframe.

This library is inconsistently creating BigQuery columns of type TIMESTAMP or DATETIME . I can't figure out what is determining the type a given column is written as.

I have written a class which sets an attribute value ( processed_datetime ) to the current time ( datetime.now() ) in the class constructor.

The class also has a method which creates and returns a dataframe. That method sets the value of the processed_datetime column in the returned dataframe to that of the processed_datetime object attribute.

I can therefore be sure that the processed_datetime column values for each dataframe created by the instance:

  1. Have the same datetime value;
  2. Are of the same datetime type ( datetime[64ns] ).

The following isn't a real implementation but gives an example of the set-up:

class ExampleClass: 
    def __init__(self): 
        self.processed_datetime = datetime.now()
    def new_df(self): 
        data = {'a':'Some value', 'b':'Some other value'}
        df = pd.DataFrame(data)
        df.processed_datetime = self.processed_datetime
        return df

example_class = ExampleClass()
df1 = example_class.new_df()
df2 = example_class.new_df()

bigquery_client.load_table_from_dataframe(df1, [...])
bigquery_client.load_table_from_dataframe(df2, [...])

In this example, I can be sure that df1.processed_datetime and df2.processed_datetime have the same values / type, but in one instance may be written to BigQuery as a DATETIME type, and in another are being written as a TIMESTAMP .

What can be causing this? What can I do to mitigate?

You can assist by specifying the schema in the job_config . Note: Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitions. Specify the type of columns whose type cannot be auto-detected.

job_config = bigquery.LoadJobConfig(
    schema=[
        bigquery.SchemaField("processed_datetime", bigquery.enums.SqlTypeNames.DATETIME),
        bigquery.SchemaField("a", bigquery.enums.SqlTypeNames.STRING),
    ],
    write_disposition="WRITE_TRUNCATE",
)

job = client.load_table_from_dataframe(
    df1, table_id, job_config=job_config
) 
job.result()

You can see the full code sample here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM