[英]Database design in BigQuery
I have a main table called "Acquisition" with multiple columns thatwould be referencing other tables (ex: "Source", "Application", etc. - For example, "Source" would have multiple possible values that wouldbe used in multiple rows of the "Acquisition" table). 我有一个名为“ Acquisition”的主表,其中有多个列将引用其他表(例如:“ Source”,“ Application”等)-例如,“ Source”将具有多个可能的值,这些值将用于“获取”表)。 What bothers mea bit is that the way is that the rows of the "Acquisition" tablewould return datas that would like this: 麻烦的是,这种方式是“ Acquisition”表的行将返回这样的数据:
id > 1 ; id> 1; value > 23.4 ; 值> 23.4; source_id > 1 ; source_id> 1; application_id > 3 ;platform_id > 1 ; application_id> 3; platform_id> 1; country_id > 1 ; country_id> 1; etc. 等等
Do you think there's another way to design it to make it more readable / user-friendly ? 您是否认为还有另一种方法可以使其更具可读性/用户友好性?
Here's an extract of the code of the schema: 这是模式代码的摘录:
acquisitionSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.FloatFieldType},
&bigquery.FieldSchema{Name: "source_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "application_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "platform_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "country_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "adtype_id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "date", Required: true, Type: bigquery.dateFieldType},
&bigquery.FieldSchema{Name: "download", Required: false, Type: bigquery.IntegerFieldType} }
sourceSchema = bigquery.Schema {
&bigquery.FieldSchema{Name: "id", Required: true, Type: bigquery.StringFieldType},
&bigquery.FieldSchema{Name: "value", Required: true, Type: bigquery.StringFieldType},
}
I thought of directly putting the value of the source, platform, etc. but it might get messy as I get my data from multiple sources through APIs unless I make all the necessary controls in my code. 我曾想过直接放置源,平台等的价值,但是当我通过API从多个源中获取数据时,除非我在代码中进行了所有必要的控制,否则可能会变得混乱。
Thanks ! 谢谢 !
Usually we do a RECORD
that has two columns (id,name)
通常我们做一个RECORD
有两列(id,name)
-country
|id
|name
this way in our query we can use country.id
to query by integer, or country.name
to display the value for quick inspection. 这样,在查询中,我们可以使用country.id
进行整数查询,或使用country.name
来显示值以进行快速检查。
Since nowadays storage is cheap, we can afford storing the literal representation in every column. 由于当今的存储价格便宜,因此我们可以负担得起将文字表示形式存储在每一列中。 Since BQ is append-only by design, and we usually read most recent row, that already contains the fresh value if the name
meanwhile suffered a change. 由于BQ在设计上是仅追加的,并且我们通常读取最近的行,因此,如果name
同时发生更改,则该行已经包含新值。 Using LAST_VALUE
function we can always pick the last record that holds the last name
. 使用LAST_VALUE
功能,我们可以随时挑选,保持最后的最后一条记录name
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.