BigQuery load job does not insert all data

Question

I have about 200k CSVs(all with same schema). I wrote a Cloud Function for them to insert them to BigQuery such that as soon as I copy the CSV to a bucket, the function is executed and data is loaded to the BigQuery dataset

I basically used the same code as in the documentation.

dataset_id = 'my_dataset'  # replace with your dataset ID
table_id = 'my_table'  # replace with your table ID
table_ref = bigquery_client.dataset(dataset_id).table(table_id)
table = bigquery_client.get_table(table_ref)  # API request 

def bigquery_csv(data, context):

  job_config = bigquery.LoadJobConfig()
  job_config.write_disposition = bigquery.WriteDisposition.WRITE_APPEND
  job_config.skip_leading_rows = 1
  # The source format defaults to CSV, so the line below is optional.
  job_config.source_format = bigquery.SourceFormat.CSV

  uri = 'gs://{}/{}'.format(data['bucket'], data['name'])
  errors = bigquery_client.load_table_from_uri(uri,
                                    table_ref,
                                    job_config=job_config)  # API request

  logging.info(errors)
  #print('Starting job {}'.format(load_job.job_id))

  # load_job.result()  # Waits for table load to complete.
  logging.info('Job finished.')

  destination_table = bigquery_client.get_table(table_ref)
  logging.info('Loaded {} rows.'.format(destination_table.num_rows))

However, when I copied all the CSVs to the bucket(which were about 43 TB), not all data was added to BigQuery and only about 500 GB was inserted.

I can't figure what's wrong. No insert jobs are being shown in Stackdriver Logging and no functions are running once the copy job is complete.

Answer 1

However, when I copied all the CSVs to the bucket(which were about 43 TB), not all data was added to BigQuery and only about 500 GB was inserted.

You are hitting BigQuery load limit as defined in this link

You should split your file into smaller file and the upload will work

BigQuery load job does not insert all data

Question

1 answers

solution1
0 ACCPTED 2019-04-18 18:21:49

BigQuery load job does not insert all data

Question

1 answers

solution1 0 ACCPTED 2019-04-18 18:21:49

solution1
0 ACCPTED 2019-04-18 18:21:49