简体   繁体   English

是否可以将云 sql 表连接到 bigquery?

[英]Is it possible to join cloud sql table to bigquery?

I have a large amount of data in bigquery and I want to do some analysis that would be enhanced by doing a join to a small set of data I have in cloud sql.我在 bigquery 中有大量数据,我想做一些分析,通过连接到我在云 sql 中的一小部分数据可以增强分析。 I've search but cannot find a sql based bridge between the two.我已经搜索但找不到两者之间基于 sql 的桥梁。 I was thinking something like this:我在想这样的事情:

SELECT
  bqdb.table as a,
  csdb.table as b,
  csdb.table as c
FROM bigquery:project:bqdb.table as t1,cloudsql:project:csdb.table as t2
JOIN t1 ON t1.a=t2.b
WHERE a='foo'
GROUP BY a,b
ORDER BY c

There's not currently a direct bridge between data in Cloud SQL and Google BigQuery. Cloud SQL 和 Google BigQuery 中的数据目前没有直接的桥梁。 In order to run a query like this, you will need to export your Cloud SQL table data in CSV format via the mysqldump tool, and then import this data into BigQuery as a new table.为了运行这样的查询,您需要通过 mysqldump 工具以 CSV 格式导出 Cloud SQL 表数据,然后将此数据作为新表导入 BigQuery。

This is now possible according to the documentation here :根据此处的文档,这现在是可能的:

Data is often scattered in many places.数据通常分散在许多地方。 You may store a customer table in BigQuery, while storing a sales table in Cloud SQL, and want to join the two tables in a single query.您可能将客户表存储在 BigQuery 中,同时将销售表存储在 Cloud SQL 中,并希望在单个查询中连接这两个表。

BigQuery Cloud SQL federation enables BigQuery to query data residing in Cloud SQL in real-time, without copying or moving data. BigQuery Cloud SQL 联合使 BigQuery 能够实时查询驻留在 Cloud SQL 中的数据,而无需复制或移动数据。 It supports both MySQL (2nd generation) and PostgreSQL instances in Cloud SQL.它支持 Cloud SQL 中的 MySQL(第 2 代)和 PostgreSQL 实例。

After the initial one-time set up, you can write a query with the new SQL function EXTERNAL_QUERY() .在初始的一次性设置之后,您可以使用新的 SQL 函数EXTERNAL_QUERY()编写查询。

... ...

Suppose you need the date of the first order for each of your customers to include in the report we described in the Overview.假设您需要将每个客户的第一个订单日期包含在我们在概述中描述的报告中。 This data is not currently in BigQuery but is available in your operational PostgreSQL database in Cloud SQL.此数据目前不在 BigQuery 中,但在 Cloud SQL 中的运营 PostgreSQL 数据库中可用。 The following federated query example accomplishes this.下面的联合查询示例实现了这一点。

 SELECT c.customer_id, c.name, SUM(t.amount) AS total_revenue, rq.first_order_date FROM customers AS c INNER JOIN transaction_fact AS t ON c.customer_id = t.customer_id LEFT OUTER JOIN EXTERNAL_QUERY( 'connection_id', '''SELECT customer_id, MIN(order_date) AS first_order_date FROM orders GROUP BY customer_id''') AS rq ON rq.customer_id = c.customer_id GROUP BY c.customer_id, c.name, rq.first_order_date;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM