[英]How to fetch the latest schema change in BigQuery and restore deleted column within 7 days
Right now I fetch columns and data type of BQ tables via the below command:现在我通过以下命令获取 BQ 表的列和数据类型:
SELECT COLUMN_NAME, DATA_TYPE
FROM `Dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
WHERE table_name="User"
But if I drop a column using command: Alter TABLE User drop column blabla
: the column blabla
is not actually deleted within 7 days(TTL) based on official documentation.但是,如果我使用命令删除列: Alter TABLE User drop column blabla
:根据官方文档,在 7 天内(TTL)实际上并未删除列blabla
。
If I use the above command, the column is still there in the schema as well as the table Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
如果我使用上述命令,该列仍然存在于模式以及表Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
It is just that I cannot insert data into such column and view such column in the GCP console.只是我无法将数据插入此类列并在 GCP 控制台中查看此类列。 This inconsistency really causes an issue.这种不一致确实会导致问题。
If I want to write bash script to monitor schema changes and do some operation based on it.如果我想编写 bash 脚本来监控架构更改并基于它进行一些操作。
I need more visibility on the table schema of BigQuery.我需要更多地了解 BigQuery 的表架构。 The least thing I need is: Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
can store a flag column that indicates deleted
or TTL:7days我最不需要的是: Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS
可以存储一个标志列,指示deleted
或 TTL:7days
My questions are:我的问题是:
If you want to fetch the recently deleted column you can try searching through Cloud Logging.如果您想获取最近删除的列,可以尝试通过 Cloud Logging 进行搜索。 I'm not sure what tools Spanner supports but if you want to use Bash you can use gcloud
to fetch logs.我不确定 Spanner 支持哪些工具,但如果你想使用Bash ,你可以使用gcloud
来获取日志。 Though it will be difficult to parse the output and get the information you want.虽然很难解析 output 并获得您想要的信息。
Command used below fetched the logs for google.cloud.bigquery.v2.JobService.InsertJob
since an ALTER TABLE
is considered as an InsertJob
and filter it based from the actual query where it says drop
.下面使用的命令获取了google.cloud.bigquery.v2.JobService.InsertJob
的日志,因为ALTER TABLE
被视为InsertJob
并根据它所说的实际查询过滤它drop
。 The regex I used is not strict (for the sake of example), I suggest updating the regex to be stricter.我使用的正则表达式并不严格(为了举例),我建议将正则表达式更新为更严格。
gcloud logging read 'protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob" AND protoPayload.metadata.jobChange.job.jobConfig.queryConfig.query=~"Alter table.*drop.*"'
Sample snippet from the command above ( Column PADDING is deleted based from the query):来自上述命令的示例片段(根据查询删除了Column PADDING ):
If you have options other than Bash, I suggest that you create a BQ sink for your logging and you can perform queries there and get these information.如果您有 Bash 以外的选项,我建议您为您的日志记录创建一个 BQ 接收器,您可以在那里执行查询并获取这些信息。 You can also use client libraries like Python, NodeJS, etc to either query in the sink or directly query in the GCP Logging.您还可以使用 Python、NodeJS 等客户端库在接收器中查询或直接在 GCP 日志中查询。
As per this SO answer , you can use the time travel feature of BQ to query the deleted column.根据这个SO answer ,您可以使用BQ 的时间旅行功能来查询已删除的列。 The answer also explains behavior of BQ to retain the deleted column within 7 days and a workaround to delete the column instantly.答案还解释了 BQ 在 7 天内保留已删除列的行为以及立即删除该列的解决方法。 See the actual query used to retrieve the deleted column and the workaround on deleting a column on the previously provided link.请参阅用于检索已删除列的实际查询以及在先前提供的链接上删除列的解决方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.