I am using the Python BigQuery library ( google-cloud-bigquery==3.3.0
) to write data into BigQuery from a Pandas dataframe.
This library is inconsistently creating BigQuery columns of type TIMESTAMP
or DATETIME
. I can't figure out what is determining the type a given column is written as.
I have written a class which sets an attribute value ( processed_datetime
) to the current time ( datetime.now()
) in the class constructor.
The class also has a method which creates and returns a dataframe. That method sets the value of the processed_datetime
column in the returned dataframe to that of the processed_datetime
object attribute.
I can therefore be sure that the processed_datetime
column values for each dataframe created by the instance:
datetime[64ns]
).The following isn't a real implementation but gives an example of the set-up:
class ExampleClass:
def __init__(self):
self.processed_datetime = datetime.now()
def new_df(self):
data = {'a':'Some value', 'b':'Some other value'}
df = pd.DataFrame(data)
df.processed_datetime = self.processed_datetime
return df
example_class = ExampleClass()
df1 = example_class.new_df()
df2 = example_class.new_df()
bigquery_client.load_table_from_dataframe(df1, [...])
bigquery_client.load_table_from_dataframe(df2, [...])
In this example, I can be sure that df1.processed_datetime and df2.processed_datetime have the same values / type, but in one instance may be written to BigQuery as a DATETIME
type, and in another are being written as a TIMESTAMP
.
What can be causing this? What can I do to mitigate?
You can assist by specifying the schema in the job_config
. Note: Specify a (partial) schema. All columns are always written to the table. The schema is used to assist in data type definitions. Specify the type of columns whose type cannot be auto-detected.
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField("processed_datetime", bigquery.enums.SqlTypeNames.DATETIME),
bigquery.SchemaField("a", bigquery.enums.SqlTypeNames.STRING),
],
write_disposition="WRITE_TRUNCATE",
)
job = client.load_table_from_dataframe(
df1, table_id, job_config=job_config
)
job.result()
You can see the full code sample here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.