简体   繁体   中英

dbt macro using Jinja in BigQuery

I am trying to compare some columns in an automated way, the column types that i am comparing can be: float, integer, string or date / timestamps but I am having issues with the syntax and how to best cover the last condition ie when I have dates/timestamps

This is what I have so far but somehow i am getting type errors. Plus I am not sure if what i have done would work or not since i am stuck at the very initial stage.

{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
{%if type(s1_col) is string %}
  if(coalesce({{s1_col}},"") != coalesce({{s2_col}},""),true,false ) as is_{{field}}_mismatch
{%- elif type(s1_col) is float or type(s1_col) is integer -%}
  if(coalesce({{s1_col}},0) != coalesce({{s2_col}},0),true,false) as is_{{field}}_mismatch
{% else %}
  if(coalesce({{s1_col}},{{another_macro_that_gets_a_date}}) != coalesce({{s2_col}},{{another_macro_that_gets_a_date}}),true,false) as is_{{field}}_mismatch
{% endif %}
{% endmacro %}
  • First I am checking if the column type is string if yes, then compare the values of two columns, if values are null then use blank to compare
  • Second I am checking if the column types are float or integer, if yes compare two columns and if there is null then compare with 0
  • lastly, I am covering the case of dates/timestamps, I couldn't find the type() for the date / timestamp so I added it to the else statement, which might not be the best thing, would be great if someone can also recommend how to best cover these cases.
  • The reason I am adding the coalesce statements is because if any of the value in one of the columns is null, the is_{{field}}_mismatch would return false, even though it should be true
  • if I enter empty strings as a part of the coalesce for the second and third cases, it will run into error due to column type, hence I have to add these in seperate if statements.

The data in your database does not flow through the code you write in Jinja -- jinja templates a sql query before that query is executed.

When you write type(s1_col) , you might expect to get the data type of the data in the database column named s1_col , but actually you will always get a string type, since the variable s1_col is a string that holds the name of a database column.

You can write this logic in a type-agnostic way by returning true if either column is null (in sql, not jinja):

{% macro find_mismatch_s1_s2_(s1_col, s2_col,field) -%}
case
    when {{ s1_col }} is null and {{ s2_col }} is not null
    then true
    when {{ s2_col }} is null and {{ s1_col }} is not null
    then true
    when {{ s1_col }} is null and {{ s2_col }} is null
    then false
    else {{ s1_col }} != {{ s2_col }}
end as is_{{field}}_mismatch
{% endmacro %}

Another option, if you really need the data type, is to use dbt's adapter class, specifically the get_columns_in_relation method. The returned List of Columns have a data_type property. See this answer for more info.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM