简体   繁体   English

如何使用pyhive访问远程配置单元

[英]How to access remote hive using pyhive

Used this link to try to connect to a remote hive. 使用此链接尝试连接到远程配置单元。 Below is the code used. 以下是使用的代码。 The error msg received is also given below 收到的错误消息也在下面给出

How to Access Hive via Python? 如何通过Python访问Hive?

Code

   from pyhive import hive
    conn = hive.Connection(host="10.111.22.11", port=10000, username="user1" ,database="default")

Error msg 错误消息

Could not connect to any of [('10.111.22.11', 10000)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/pyhive/hive.py", line 131, in __init__
    self._transport.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 61, in open
    self._trans.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift/transport/TSocket.py",line 113, in open
    raise TTransportException(TTransportException.NOT_OPEN, msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.111.22.11', 10000)]

What are the other requirements for successful connection? 成功连接的其他要求是什么? I am able to connect to the server directly (using putty) and run hive. 我可以直接连接到服务器(使用putty)并运行配置单元。 But when tried from another server X i get this error. 但是当从另一台服务器X尝试时,我得到了这个错误。 Also i can ping the hive server from server X. 我也可以从服务器X ping蜂房服务器。

Could the port number be the problem? 端口号可能是问题吗? How do i check the correct port number? 我如何检查正确的端口号?

As discussed in the below answer i tried to start hiveserver2 . 正如下面的答案中所讨论的,我试图启动hiveserver2。 But the command doesnt seem to work. 但命令似乎没有用。 Any help is really appreciated. 任何帮助都非常感谢。

Also the port i see in the log when i execute a query from hive shell is 8088 . 当我从hive shell执行查询时,我在日志中看到的端口是8088 wonder if this should be the port instead of 10000 (both did not work anyway) 想知道这应该是端口而不是10000 (两者都不起作用)

Could not make it work using pyhive . 无法使用pyhive使其工作。 Had to use paramiko insted below is the sample code 不得不使用paramiko下面是示例代码

import os
import paramiko
import time 

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
ssh.connect('1.1.1.1', username='uid', password='pwd')
sshin, sshout, ssherr= ssh.exec_command('hive -e "create table test(key varchar(10),keyval varchar(200))"')

HiveServer2 process must be started in your remote Hive host. 必须在远程Hive主机中启动HiveServer2进程。 10000 is the default port number. 10000是默认端口号。

Use this command to start HiveServer2. 使用此命令启动HiveServer2。

$HIVE_HOME/bin/hiveserver2 

Please try below code to access remote hive table using pyhive: 请尝试以下代码使用pyhive访问远程配置单元表:

from pyhive import hive
import pandas as pd

#Create Hive connection 
conn = hive.Connection(host="10.111.22.11", port=10000, username="user1")

# Read Hive table and Create pandas dataframe
df = pd.read_sql("SELECT * FROM db_Name.table_Name limit 10", conn)
print(df.head())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM