Right now I'm copying files on Google Cloud Storage to Bigquery using the following line in node.js:
const bigquery = new BigQuery();
bigquery.dataset(xx).table(xx).load(storage.bucket(bucketName).file(fileName));
But now I'd like to add a new timestamp column to this file. So how can I do this?
So two questions I could think of: First read this file into some data structure like array:
array = FunctionToReadFileNameToArray(FileName);
Do we have such a function? Suppose we have, then it's quite easy to manipulate upon the array to add timestamp column.
Second, load the new array data into bigquery. But I only find one way to insert streaming data:
bigquery.dataset(xx).table(xx).insert(rows);
And here rows is different data structure like dictionary/map but not array. So how can we load array into bigquery?
Thanks
I'm going to assume you have a file (Object) of structured records (JSON, XML, CSV). The first task would appear to be opening that GCS object for reading. You would then read one record at a time. You would then augment that record with your desired extra column (timestamp) and then invoke the insert() API. This API can take a single object to be inserted or an array of objects.
However ... if this is a one-time event or can be performed in batch ... you may find it cheaper to read the GCS object and write a new GCS object containing your desired data and THEN load the data into BQ as a unit. Looking at the pricing for BQ, we seem to find that streaming inserts are charged at $0.01 per 200MB in addition to the storage costs which would be bypassed for a GCS object load as a unit. My own thinking is that doing extra work to save pennies is a poor use of time/money but if you are processing TB of data over months, it may add up.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.