简体   繁体   English

带GCS数据源的Bigquery表不影响改成gcs的数据

[英]Bigquery table with GCS data source does not affect data changed into gcs

I am pretty new in bigquery.我是 bigquery 的新手。 I have created bigquery table from gcp console where GCS CSV file is used as data source.我从 gcp 控制台创建了 bigquery 表,其中 GCS CSV 文件用作数据源。 I think when i delete any row, that should also be deleted from GCS file.我认为当我删除任何行时,也应该从 GCS 文件中删除。 But practically it's not happening.但实际上它并没有发生。

When you use BigQuery, you have 2 ways to load data from GCS CSV file.使用 BigQuery 时,您有两种方法可以从 GCS CSV 文件加载数据。

  1. The most common, is to perform a load job.最常见的是执行加载作业。 This means that your CSV data are loaded (copied) into a BigQuery native table. 这意味着您的 CSV 数据已加载(复制)到 BigQuery 本机表中。 After the load, there no link maintained between the file and the BigQuery data.加载后,文件和 BigQuery 数据之间没有链接。

In this case, it's normal that the file doesn't change when you delete data into BigQuery在这种情况下,当您将数据删除到 BigQuery 中时,文件没有更改是正常的

  1. You can define external table and query directly the data into your file hosted in GCS .您可以定义 外部表并将数据直接查询到托管在 GCS 中的文件中 It prevents the data duplication but the query are slower.它可以防止数据重复,但查询速度较慢。 In addition, DML (Data Manipulation Language) statements (INSERT, UPDATE, DELETE) aren't supported on external table.此外,外部表不支持DML(数据操作语言)语句(INSERT、UPDATE、DELETE)

Workaround解决方法

As workaround, your can use the solution 1:作为解决方法,您可以使用解决方案 1:

As you can see in the image below, BigQuery supports three types of tables: Native , External and Views如下图所示, BigQuery支持三种类型的表: NativeExternalViews

在此处输入图像描述

When you create a Native Table, your data is fully imported into BigQuery 's storage system and transformed in order to be optimized for queries.当您创建原生表时,您的数据将完全导入BigQuery的存储系统并进行转换,以便针对查询进行优化。 An External Table is basically a pointer to your source files.外部表基本上是指向源文件的指针。 In other words, every time you run a query against an External Table, BigQuery access the original source of data (some file in GCS, Google Driver, etc..)换句话说,每次您针对外部表运行查询时, BigQuery都会访问原始数据源(GCS、Google 驱动程序等中的某些文件)

Given that, I can go directly to your question: BigQuery will not update the source files when you run some DML statement .鉴于此,我可以 go 直接回答您的问题: BigQuery will not update the source files when you run some DML statement If you run a DML statement (DELETE, UPDATE) against a Native table, the data inside BigQuery's storage system will be changed but the files will not be touched.如果您对 Native 表运行 DML 语句(DELETE、UPDATE),BigQuery 存储系统中的数据将被更改,但文件不会被触及。

Also, DML is not supported in External Tables.此外,外部表中不支持 DML。 If you try to run a DELETE statement agains an External Table for example you will get an error: DML over table 'project.dataset.table' is not supported.例如,如果您尝试再次对外部表运行 DELETE 语句,您将收到错误:不支持表“project.dataset.table”上的 DML。

I strongly recommend that you take a look in this documentation我强烈建议您查看此文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM