[英]BigQuery BI Engine: how to choose a good reservation size?
我正在啟用 BI Engine 來加快我的查詢速度並為我在歐盟地區的項目節省成本。
設置預留大小的最佳選擇是什么? 1GB、2GB、4GB?
我如何做出這個決定?
下面是一個 SQL 腳本,它按處理的 GB 數量對查詢進行分組,因此第一行是每個查詢處理的 0 到 1 GB,第二行處理的是 1 到 2 GB,依此類推。
然后對於每一行,它顯示處理的金額、計費金額以及相關成本和節省的成本。
這應該可以幫助您了解您的成本所在、您有多少個特定大小的查詢以及您是否可以增加或降低您的預留大小。
請注意,BI 引擎只能加速某些 SELECT QUERY 而不能加速 MERGE、INSERT、CREATE 等語句。 還有更多的例外。 因此,為了公平比較,我排除了這些類型的查詢,以便更好地了解節省的規模。 另請參閱: https://cloud.google.com/bigquery/docs/bi-engine-intro#bi-engine-use-cases
DECLARE QUERY_COST_PER_TB NUMERIC DEFAULT 5.00; -- current cost in dollars of processing 1 TB of data in BQ
with possible_bi_engine_jobs_incl_parent_jobs as (
select
creation_time,
bi_engine_statistics,
cache_hit,
total_bytes_processed / power(1024, 3) GB_processed,
floor(total_bytes_processed / power(1024, 3)) GB_processed_floor,
total_bytes_billed / power(1024, 3) GB_billed,
total_bytes_processed / power(1024, 4) * QUERY_COST_PER_TB expected_cost_in_euros,
total_bytes_billed / power(1024, 4) * QUERY_COST_PER_TB actual_cost_in_euros,
query,
job_id,
parent_job_id,
user_email,
job_type,
statement_type,
from `my_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
where 1=1
and creation_time >= '2022-12-08'
and creation_time < '2022-12-09'
and cache_hit = false -- bi engine will not be improving on queries that are already cache hits
and total_bytes_processed is not null -- if there's no bytes processed, then ignore the job
and statement_type = 'SELECT' -- statement types such as MERGE, CREATE, UPDATE cannot be run by bi engine, only SELECT statements
and job_type = 'QUERY' -- LOAD jobs etc. cannot be run by bi engine, only QUERY jobs
and upper(query) like '%FROM%' -- query should contain FROM, otherwise it will not be run by bi engine
and upper(query) not like '%INFORMATION_SCHEMA.%' -- metadata queries can not be run by bi engine
),
-- to prevent double counting of total_bytes_processed and total_bytes_billed
parent_job_ids_to_ignore as (
select distinct parent_job_id
from possible_bi_engine_jobs_incl_parent_jobs
where parent_job_id is not null
),
possible_bi_engine_jobs_excl_parent_jobs as (
select *
from possible_bi_engine_jobs_incl_parent_jobs
where job_id not in (select parent_job_id from parent_job_ids_to_ignore) -- to prevent double counting of total_bytes_processed and total_bytes_billed
)
select
GB_processed_floor, -- all queries which processed less GB than the floor value
count(1) query_count,
sum(case when bi_engine_statistics.bi_engine_mode in ('FULL', 'PARTIAL') then 1 else 0 end) bi_engine_enabled,
sum(case when bi_engine_statistics.bi_engine_mode in ('DISABLED') or bi_engine_statistics.bi_engine_mode IS NULL then 1 else 0 end) bi_engine_disabled,
round(sum(GB_processed), 1) GB_processed,
round(sum(GB_billed), 1) GB_billed,
round(sum(expected_cost_in_euros), 2) expected_cost_in_euros,
round(sum(actual_cost_in_euros), 2) actual_cost_in_euros,
round(sum(expected_cost_in_euros) - sum(actual_cost_in_euros), 2) saved_cost
from possible_bi_engine_jobs_excl_parent_jobs
group by GB_processed_floor
order by GB_processed_floor
;
這將生成以下按查詢大小分組的成本節省表:
有關 BI Engine 節省的其他有用鏈接:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.