[英]dbt macro using Jinja in BigQuery
I am trying to compare some columns in an automated way, the column types that i am comparing can be: float, integer, string or date / timestamps but I am having issues with the syntax and how to best cover the last condition ie when I have dates/timestamps我正在尝试以自动方式比较某些列,我正在比较的列类型可以是:float、integer、字符串或日期/时间戳,但我在语法以及如何最好地涵盖最后一个条件方面遇到问题,即当我有日期/时间戳
This is what I have so far but somehow i am getting type errors.这是我到目前为止所拥有的,但不知何故我遇到了类型错误。 Plus I am not sure if what i have done would work or not since i am stuck at the very initial stage.
另外,我不确定我所做的是否会奏效,因为我还停留在最初的阶段。
{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
{%if type(s1_col) is string %}
if(coalesce({{s1_col}},"") != coalesce({{s2_col}},""),true,false ) as is_{{field}}_mismatch
{%- elif type(s1_col) is float or type(s1_col) is integer -%}
if(coalesce({{s1_col}},0) != coalesce({{s2_col}},0),true,false) as is_{{field}}_mismatch
{% else %}
if(coalesce({{s1_col}},{{another_macro_that_gets_a_date}}) != coalesce({{s2_col}},{{another_macro_that_gets_a_date}}),true,false) as is_{{field}}_mismatch
{% endif %}
{% endmacro %}
coalesce
statements is because if any of the value in one of the columns is null, the is_{{field}}_mismatch
would return false, even though it should be truecoalesce
语句的原因是因为如果其中一列中的任何值是 null, is_{{field}}_mismatch
将返回 false,即使它应该是 truecoalesce
for the second and third cases, it will run into error due to column type, hence I have to add these in seperate if statements.coalesce
的一部分,它将因列类型而出错,因此我必须在单独的 if 语句中添加它们。The data in your database does not flow through the code you write in Jinja -- jinja templates a sql query before that query is executed.数据库中的数据不会流经您在 Jinja 中编写的代码——Jinja 在执行该查询之前模板化了一个 sql 查询。
When you write type(s1_col)
, you might expect to get the data type of the data in the database column named s1_col
, but actually you will always get a string
type, since the variable s1_col
is a string that holds the name of a database column.当您编写
type(s1_col)
时,您可能希望获得名为s1_col
的数据库列中数据的数据类型,但实际上您将始终获得string
类型,因为变量s1_col
是一个包含数据库名称的字符串柱子。
You can write this logic in a type-agnostic way by returning true if either column is null (in sql, not jinja):如果任一列为 null(在 sql 中,不是 jinja),您可以通过返回 true 以类型不可知的方式编写此逻辑:
{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
case
when {{ s1_col }} is null and {{ s2_col }} is not null
then true
when {{ s2_col }} is null and {{ s1_col }} is not null
then true
when {{ s1_col }} is null and {{ s2_col }} is null
then false
else {{ s1_col }} != {{ s2_col }}
end as is_{{field}}_mismatch
{% endmacro %}
Another option, if you really need the data type, is to use dbt's adapter
class, specifically the get_columns_in_relation
method.如果您确实需要数据类型,另一种选择是使用 dbt 的
adapter
class,特别是get_columns_in_relation
方法。 The returned List of Columns have a data_type
property.返回的列列表具有
data_type
属性。 See this answer for more info.有关详细信息,请参阅此答案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.