I'm using subprocess to run hive commands in python, but am getting empty results. If i run the same commands from hive CLI, am getting results.
query = "set hive.cli.print.header=true;use mydb;describe table1;"
process = subprocess.Popen( ["ssh", "hadoop" , "hive", "-e", "%r" % query], stdout = subprocess.PIPE, stderr = subprocess.PIPE )
data = [line.split('\t') for line in process.stdout]
cols = list(itertools.chain.from_iterable(data[:1]))
df = pd.DataFrame(data[1:], columns = cols)
print "==>"+df+"<----"
It's returning empty dataframe.
Please help me with this
myfile=open("query_result.tsv", 'w')
p=subprocess.Popen("your query",
shell=True,
stdout=myfile,stderr=subprocess.PIPE)
stdout,stderr = p.communicate()
if p.returncode != 0:
print stderr
sys.exit(1)
myfile is a tsv file,you can use pandas.read_csv(sep='\\t') and set sep='\\t' ,you may need to look up pandas api to find more usage about read_csv().
you should look up subprocess api in 17.1.2 about Popen Object.it gives you a warning about stdout=PIPE. https://docs.python.org/2/library/subprocess.html#frequently-used-arguments
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.