简体   繁体   English

BigQuery中的数据库设计

[英]Database design in BigQuery

I have a main table called "Acquisition" with multiple columns thatwould be referencing other tables (ex: "Source", "Application", etc. - For example, "Source" would have multiple possible values that wouldbe used in multiple rows of the "Acquisition" table). 我有一个名为“ Acquisition”的主表,其中有多个列将引用其他表(例如:“ Source”,“ Application”等)-例如,“ Source”将具有多个可能的值,这些值将用于“获取”表)。 What bothers mea bit is that the way is that the rows of the "Acquisition" tablewould return datas that would like this: 麻烦的是,这种方式是“ Acquisition”表的行将返回这样的数据:

id > 1 ; id> 1; value > 23.4 ; 值> 23.4; source_id > 1 ; source_id> 1; application_id > 3 ;platform_id > 1 ; application_id> 3; platform_id> 1; country_id > 1 ; country_id> 1; etc. 等等

Do you think there's another way to design it to make it more readable / user-friendly ? 您是否认为还有另一种方法可以使其更具可读性/用户友好性?

Here's an extract of the code of the schema: 这是模式代码的摘录:

acquisitionSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
    &bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
    &bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType}   } 

sourceSchema = bigquery.Schema {
    &bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
    &bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}

I thought of directly putting the value of the source, platform, etc. but it might get messy as I get my data from multiple sources through APIs unless I make all the necessary controls in my code. 我曾想过直接放置源,平台等的价值,但是当我通过API从多个源中获取数据时,除非我在代码中进行了所有必要的控制,否则可能会变得混乱。

Thanks ! 谢谢 !

Usually we do a RECORD that has two columns (id,name) 通常我们做一个RECORD有两列(id,name)

-country
 |id
 |name

this way in our query we can use country.id to query by integer, or country.name to display the value for quick inspection. 这样,在查询中,我们可以使用country.id进行整数查询,或使用country.name来显示值以进行快速检查。

Since nowadays storage is cheap, we can afford storing the literal representation in every column. 由于当今的存储价格便宜,因此我们可以负担得起将文字表示形式存储在每一列中。 Since BQ is append-only by design, and we usually read most recent row, that already contains the fresh value if the name meanwhile suffered a change. 由于BQ在设计上是仅追加的,并且我们通常读取最近的行,因此,如果name同时发生更改,则该行已经包含新值。 Using LAST_VALUE function we can always pick the last record that holds the last name . 使用LAST_VALUE功能,我们可以随时挑选,保持最后的最后一条记录name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM