[英]DROP column and update BigQuery table containing streaming data
I have a BigQuery table with streaming data.我有一个包含流数据的 BigQuery 表。 The table is being populated using a Dataflow job.
正在使用数据流作业填充该表。 Recently I updated my Dataflow pipeline by removing
Column_B
of the two columns shown below:最近我通过删除下面显示的两列的
Column_B
更新了我的数据流管道:
| Column_A | Column_B |
|----------|-----------|
| Anna | Chicago |
| John | Houston |
But now my updated table contains the same number of columns as before, but with new data intended for Column_B
replaced with null
.但是现在我更新后的表包含与以前相同数量的列,但是用于
Column_B
的新数据替换为null
。 Here´s an example of my updated pipeline:这是我更新的管道的示例:
| Column_A | Column_B |
|----------|-----------|
| Anna | Chicago |
| John | Houston |
| Michael | null |
| Cecilia | null |
| Ronald | null |
My table is partitioned on timestamp.我的表按时间戳分区。 I am wondering if there´sa way to completely drop
Column_B
and looking for suggestions regarding how to (if I should) do that.我想知道是否有办法完全删除
Column_B
并寻找有关如何(如果我应该)这样做的建议。 Also, how would that impact my table.另外,这将如何影响我的桌子。
Thanks in advance.提前致谢。
For simplicity assume your current table is named as table_name
.为简单起见,假设您当前的表被命名为
table_name
。
STEP 1. In the query settings select following option: STEP 1. 在查询设置select以下选项:
Set a destination table for query results
Step 2. Run following query to save the result set as a table:步骤 2. 运行以下查询将结果集保存为表:
SELECT *
EXCEPT(Column_B)
FROM table_name
Table created in Step 2 is named as table_name_modified
.在步骤 2 中创建的表被命名为
table_name_modified
。 This table will act as a backup for your data.该表将作为您数据的备份。
Step 3. Drop table_name
.第 3 步。删除
table_name
。 Once you drop the table_name
rename table_name_modified
to table_name
.删除
table_name
后,将table_name
table_name_modified
Now you have updated the table to exclude Column_B Dataflow won't be populating nulls anymore.现在您已更新表以排除 Column_B Dataflow 将不再填充空值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.