[英]Airflow: BigQuery SQL Insert empty data to the table
Using Airflow, I am trying to get the data from one table to insert it into another in BigQuery.使用 Airflow,我试图从一个表中获取数据以将其插入到 BigQuery 中的另一个表中。 I have 5 origin tables and 5 destination tables.我有 5 个源表和 5 个目标表。 My SQL query and python logic work for the 4 tables where it successfully gets the data and inserts it into their respective destination tables, but it doesn't work for 1 table.我的 SQL 查询和 python 逻辑适用于 4 个表,它成功获取数据并将其插入到各自的目标表中,但不适用于 1 个表。
query = '''SELECT * EXCEPT(eventdate) FROM `gcp_project.gcp_dataset.gcp_table_1`
WHERE id = "1234"
AND eventdate = "2023-01-18"
'''
# Delete the previous destination tables if existed
bigquery_client.delete_table("gcp_project.gcp_dataset.dest_gcp_table_1", not_found_ok=True)
job_config = bigquery.QueryJobConfig()
table_ref = bigquery_client.dataset(gcp_dataset).table(dest_gcp_table_1)
job_config.destination = table_ref
job_config.write_disposition = bigquery.WriteDisposition.WRITE_TURNCATE
# Start the query, passing in the extra configuration.
query_job = bigquery_client.query(query=query,
location='US',
job_config=job_config
)
#check if the table is successfully written
while not query_job.done():
time.sleep(1)
logging.info("Data is written into a destination table with {} number of rows for id {}."
.format(query_job.result().total_rows, id))
I have even tried using the SQL query with CREATE OR REPLACE
but the result was still the same table_1 is coming as empty.我什至尝试将 SQL 查询与CREATE OR REPLACE
一起使用,但结果仍然是相同的 table_1 为空。 I have also tried BigQueryInsertJobOperator
, but table_1 still comes empty.我也尝试过BigQueryInsertJobOperator
,但 table_1 仍然是空的。
I tried to execute the above logic from my local machine and it works fine for table_1 as well, I see the data in GCP BigQuery.我尝试从我的本地机器执行上述逻辑,它也适用于 table_1,我在 GCP BigQuery 中看到了数据。
I am not sure why and what's happening behind this.我不确定为什么以及这背后发生了什么。 Does anyone have any idea why this happening or what can it cause?有谁知道为什么会发生这种情况或它会导致什么?
Found the root cause for this, the previous query which is responsible for populating the origin table was still running in the GCP BigQuery backend.找到了根本原因,之前负责填充源表的查询仍在 GCP BigQuery 后端运行。 Because of that the above query did get any data.因此,上述查询确实获得了任何数据。
Solution: introduced query_job.result()
This will wait for the job to be complete and then execute the next query.解决方案:引入query_job.result()
,这个会等待job完成,然后执行下一个query。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.