简体   繁体   English

将模板变量传递给HiveOperator

[英]Passing template variables to HiveOperator

I have a jinja template which I plan to use for dynamic SQL generation in Hive. 我有一个Jinja模板,打算在Hive中用于动态SQL生成。 My template look like as follows: 我的模板如下所示:

USE {{ db }};

CREATE EXTERNAL TABLE IF NOT EXISTS foo (
    A int,
    B int
)
stored as parquet
location ‘….’;

"db" is something that can be derived by making a function call. “ db”是可以通过进行函数调用而派生的。 I decided to write an operator extending HiveExecOperator. 我决定写一个扩展HiveExecOperator的运算符。 In my environment the class hierarchy is: 在我的环境中,类层次结构为:

BaseOperator <—— BaseExecOperator <— HiveExecOperator BaseOperator <---- BaseExecOperator <-HiveExecOperator

My TestHive operator looks like following: 我的TestHive运算符如下所示:

class TestHive(HiveExecOperator):
    def pre_execute(self, context):
        context[‘db’] = func1(…,,)
        return context['ti'].render_templates()

This one is not working as {{ db }} inside the template doesn't get anything and the hive statement fails. 由于模板中的{{db}}无法获得任何结果,因此hive语句失败。 I also tried overriding render_template in TestHive as follows: 我还尝试如下重写TestHive中的render_template:

class TestHive(HiveExecOperator):
    def render_template(self, attr, content, context):
    context['db'] = func1(..,)
    return super(TestHive, self).render_templates(attr, content, context)

This one fails as the parent class of TestHive doesn't have render_templates method. 由于TestHive的父类没有render_templates方法,因此此方法失败。

Method: render_templates" is only defined in BaseOperator.

Any help is appreciated. 任何帮助表示赞赏。

Assuming you mean HiveOperator and not HiveExecOperator, and having a look at what you're describing, I don't believe you should need to derive any kind of operator here. 假设您指的是HiveOperator而不是HiveExecOperator,并且查看了您所描述的内容,那么我认为您不需要在这里派生任何类型的运算符。 Unless there's some extra missing info which I'm not seeing, you're simply asking how to pass the value of a function call as a parameter into a templated command. 除非我没有看到一些额外的缺失信息,否则您只是在问如何将函数调用的值作为参数传递给模板命令。

The hql argument of HiveOperator is a template field . HiveOperatorhql参数是一个模板字段 That means you should be able to simply define your template as you've done already and then provide the value to it as part of that Operator call. 这意味着您应该能够像已经完成的那样简单地定义模板,然后在操作员调用中为其提供值。 But remember to prefix the variable being passed in with params. 但是请记住在要传递的变量前加上参数。 See: 看到:

my_query= """
    USE {{ params.db }};

    CREATE EXTERNAL TABLE IF NOT EXISTS foo (
    A int,
    B int
    )
    stored as parquet
    location .......
    """

run_hive_query = HiveOperator(
    task_id="my_task",
    hql=my_query,
    params={ 'db': func1(...) },
    dag=dag
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM