[英]How to execute list of hive queries which is in gcp storage bucket (in my case gs:/hive/hive.sql") while submitting hive job to dataproc cluster
在这里,我在hiveJob下的queryList中编写查询。
将Hive作业提交到Dataproc集群
def submit_hive_job(dataproc, project, region,
cluster_name):
job_details = {
'projectId': project,
'job': {
'placement': {
'clusterName': cluster_name
},
"hiveJob": {
"queryList": {
###
how can i execute .sql file here which is in bucket
####
"queries": [
"CREATE TABLE IF NOT EXISTS sai ( eid int, name String, salary String, destination String)",
"Insert into table sai values (26,'Shiv','1500','ac')"
]
}
}
}
}
result = dataproc.projects().regions().jobs().submit(
projectId=project,
region=region,
body=job_details).execute()
job_id = result['reference']['jobId']
print('Submitted job Id {}'.format(job_id))
return job_id
存储桶中的hive.sql文件
create table employee ( employeeid: int, employeename: string, salary: float) rows format delimited fields terminated by ‘,’ ;
describe employee;
select * from employee;
我发现我们可以将.sql文件保存在存储桶中,然后按如下所示指定queryFileUri
"hiveJob": {
"queryFileUri":"gs://queryfile/test.sql"
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.