简体   繁体   English

如何在 BigQuery 中获取最新的架构更改并在 7 天内恢复已删除的列

[英]How to fetch the latest schema change in BigQuery and restore deleted column within 7 days

Right now I fetch columns and data type of BQ tables via the below command:现在我通过以下命令获取 BQ 表的列和数据类型:

SELECT COLUMN_NAME, DATA_TYPE 
   FROM `Dataset`.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS 
WHERE table_name="User"

But if I drop a column using command: Alter TABLE User drop column blabla : the column blabla is not actually deleted within 7 days(TTL) based on official documentation.但是,如果我使用命令删除列: Alter TABLE User drop column blabla :根据官方文档,在 7 天内(TTL)实际上并未删除列blabla

If I use the above command, the column is still there in the schema as well as the table Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS如果我使用上述命令,该列仍然存在于模式以及表Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS

It is just that I cannot insert data into such column and view such column in the GCP console.只是我无法将数据插入此类列并在 GCP 控制台中查看此类列。 This inconsistency really causes an issue.这种不一致确实会导致问题。

If I want to write bash script to monitor schema changes and do some operation based on it.如果我想编写 bash 脚本来监控架构更改并基于它进行一些操作。

I need more visibility on the table schema of BigQuery.我需要更多地了解 BigQuery 的表架构。 The least thing I need is: Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS can store a flag column that indicates deleted or TTL:7days我最不需要的是: Dataset.INFORMATION_SCHEMA.COLUMN_FIELD_PATHS可以存储一个标志列,指示deleted或 TTL:7days

My questions are:我的问题是:

  1. How can I fetch the correct schema in spanner which reflects the recently deleted the column?如何在 spanner 中获取反映最近删除的列的正确模式?
  2. If the column is not actually deleted, is there any way to easily restore it?如果该列实际上没有被删除,有什么方法可以轻松恢复它?
  1. If you want to fetch the recently deleted column you can try searching through Cloud Logging.如果您想获取最近删除的列,可以尝试通过 Cloud Logging 进行搜索。 I'm not sure what tools Spanner supports but if you want to use Bash you can use gcloud to fetch logs.我不确定 Spanner 支持哪些工具,但如果你想使用Bash ,你可以使用gcloud来获取日志。 Though it will be difficult to parse the output and get the information you want.虽然很难解析 output 并获得您想要的信息。

    Command used below fetched the logs for google.cloud.bigquery.v2.JobService.InsertJob since an ALTER TABLE is considered as an InsertJob and filter it based from the actual query where it says drop .下面使用的命令获取了google.cloud.bigquery.v2.JobService.InsertJob的日志,因为ALTER TABLE被视为InsertJob并根据它所说的实际查询过滤它drop The regex I used is not strict (for the sake of example), I suggest updating the regex to be stricter.我使用的正则表达式并不严格(为了举例),我建议将正则表达式更新为更严格。

     gcloud logging read 'protoPayload.methodName="google.cloud.bigquery.v2.JobService.InsertJob" AND protoPayload.metadata.jobChange.job.jobConfig.queryConfig.query=~"Alter table.*drop.*"'

    Sample snippet from the command above ( Column PADDING is deleted based from the query):来自上述命令的示例片段(根据查询删除了Column PADDING ):

    在此处输入图像描述

    If you have options other than Bash, I suggest that you create a BQ sink for your logging and you can perform queries there and get these information.如果您有 Bash 以外的选项,我建议您为您的日志记录创建一个 BQ 接收器,您可以在那里执行查询并获取这些信息。 You can also use client libraries like Python, NodeJS, etc to either query in the sink or directly query in the GCP Logging.您还可以使用 Python、NodeJS 等客户端库在接收器中查询或直接在 GCP 日志中查询。

  2. As per this SO answer , you can use the time travel feature of BQ to query the deleted column.根据这个SO answer ,您可以使用BQ 的时间旅行功能来查询已删除的列。 The answer also explains behavior of BQ to retain the deleted column within 7 days and a workaround to delete the column instantly.答案还解释了 BQ 在 7 天内保留已删除列的行为以及立即删除该列的解决方法。 See the actual query used to retrieve the deleted column and the workaround on deleting a column on the previously provided link.请参阅用于检索已删除列的实际查询以及在先前提供的链接上删除列的解决方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM