简体   繁体   中英

python - Convert Panda Dataframe string-column into bigquery.SchemaField "TIMESTAMP"

I am trying to load a BigQuery Table from a python panda Dataframe .

the csv file has the content:

t_time
2023-01-01 07:20:54.272000 UTC
2023-01-02 04:22:26.914000 UTC
2023-01-03 04:32:38.663000 UTC

the BigQuery table has one column t_time with datatype TIMESTAMP

schema: bigquery.SchemaField("t_time", "TIMESTAMP", mode="NULLABLE")

snippet code:

from google.cloud import bigquery
import pandas as pd
import ... 
client = bigquery.Client()

df=pd.read_csv("./my_times.csv",  header=1, names=['t_time'])   
print(f"> {df['t_time']}")
df.info()
job_config = bigquery.LoadJobConfig(
  schema = [
    bigquery.SchemaField("t_time", "TIMESTAMP"),
  ]
  write_disposition="WRITE_TRUNCATE",
)
client.load_table_from_dataframe(df, "myproj.mydataset.mytable", job_config=job_config).result()

output:

    0     2022-08-03 07:20:54.272000 UTC
    1     2022-08-04 04:22:26.914000 UTC
    2     2022-08-03 04:32:38.663000 UTC
Name: t_time, dtype: object
Error object of type <class 'str'> cannot be converted to int

The problem is in bigquery.SchemaField("insert_timestamp", "TIMESTAMP"), I am wondering why, since I have other tables with timestamp format and with times in that format <date> <time> UTC .

I have tried also to convert the dataframe column t_time to timestamp but without success (not sure hot to convert from that format into timestamp).

What would be the correct approach to have the bigquery table with datatype timestamp for the give CSV format (with UTC)?

Can you try this:

from google.cloud import bigquery
import pandas as pd
client = bigquery.Client()
 
df=pd.read_csv("./csv_t_time - Sheet1.csv",  header=1, names=['t_time'])  
print(f"> {df['t_time']}")
 
job_config = bigquery.job.LoadJobConfig(
schema = [
 
   bigquery.SchemaField("t_time", "TIMESTAMP"),
  
],autodetect=False,
   source_format=bigquery.SourceFormat.CSV,  write_disposition="WRITE_TRUNCATE",allow_quoted_newlines = True,
 
)
 
client.load_table_from_dataframe(df, "myproj.mydataset.mytable", job_config=job_config).result()

To avoid the error you need to set source_format=bigquery.SourceFormat.CSV in job_config and while loading CSV with embedded newlines, you need to specify allowQuotedNewlines=True .For more information you can follow this link .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM