简体   繁体   English

从远程机器执行长时间运行的配置单元查询

[英]executing long running hive queries from remote machine

I've to execute long running (~10 hours) hive queries from my local server using a python script. 我将使用python脚本从本地服务器执行长时间运行(~10小时)的hive查询。 my target hive server is in an aws cluster. 我的目标配置单元服务器位于aws集群中。

I've tried to execute it using pyhs2, execute(' <command> ') 我试过用pyhs2执行它,执行(' <command> ')

and

paramiko, exec_command('hive -e " <command> "') paramiko,exec_command('hive -e“ <command> ”')

in both cases my query will be running in hive server and will complete successfully. 在这两种情况下,我的查询将在hive服务器中运行并将成功完成。 but issue is even after successfully completing the query my parent python script continue to wait for return value and will remain in Interruptible sleep (Sl) state for infinite time! 但问题是,即使在成功完成查询后,我的父python脚本继续等待返回值,并将保持在可中断睡眠(S1)状态无限时间!

is there anyway I can make my script work fine using pyhs2 or paramiko? 无论如何我可以使用pyhs2或paramiko使我的脚本正常工作? os is there any other better option available in python? os是否还有其他更好的python选项?

As i mentioned before that even I face a similar issue in my Performance based environment. 正如我之前提到的,即使我在基于性能的环境中遇到类似的问题。 My use-case was i was using PYHS2 module to run queries using HIVE TEZ execution engine. 我的用例是我使用PYHS2模块使用HIVE TEZ执行引擎运行查询。 TEZ generates lot of logs(basically in seconds scale). TEZ生成大量日志(基本上以秒为单位)。 the logs gets captured in STDOUT variable and is provided to the output once the query successfully completes. 日志在STDOUT变量中捕获,并在查询成功完成后提供给输出。 The way to overcome is to stream the output as an when it is generated as shown below: 要克服的方法是将输出流生成为生成时,如下所示:

    for line in iter(lambda: stdout.readline(2048), ""):
    print line

But for this you will have to use native connection to cluster using PARAMIKO or FABRIC and then issue hive command via CLI or beeline. 但为此,您必须使用PARAMIKO或FABRIC使用本机连接到群集,然后通过CLI或直线发出hive命令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM