將Hive作業提交到Dataproc集群時，如何執行gcp存儲桶中的Hive查詢列表（在我的情況下為gs：/hive/hive.sql”）

Question

在這里，我在hiveJob下的queryList中編寫查詢。

將Hive作業提交到Dataproc集群

def submit_hive_job(dataproc, project, region,
                       cluster_name):
    job_details = {
        'projectId': project,
        'job': {
            'placement': {
                'clusterName': cluster_name
            },
            "hiveJob": {
                "queryList": {
                    ###
                    how can i execute .sql file here which is in bucket
                    ####
                    "queries": [
                        "CREATE TABLE IF NOT EXISTS sai ( eid int, name String, salary String, destination String)",
                        "Insert into table sai values (26,'Shiv','1500','ac')"
                    ]
                }
            }
        }
    }
    result = dataproc.projects().regions().jobs().submit(
        projectId=project,
        region=region,
        body=job_details).execute()
    job_id = result['reference']['jobId']
    print('Submitted job Id {}'.format(job_id))
    return job_id

存儲桶中的hive.sql文件

create table employee ( employeeid: int, employeename: string, salary: float) rows format delimited fields terminated by ‘,’ ;
describe employee;
select * from employee;

Answer 1

我發現我們可以將.sql文件保存在存儲桶中，然后按如下所示指定queryFileUri

"hiveJob": {
 "queryFileUri":"gs://queryfile/test.sql"             
}

將Hive作業提交到Dataproc集群時，如何執行gcp存儲桶中的Hive查詢列表（在我的情況下為gs：/hive/hive.sql”）

問題描述

1 個解決方案

解決方案1
3 已采納 2018-11-22 12:26:31

將Hive作業提交到Dataproc集群時，如何執行gcp存儲桶中的Hive查詢列表（在我的情況下為gs：/hive/hive.sql”）

問題描述

1 個解決方案

解決方案1 3 已采納 2018-11-22 12:26:31

解決方案1
3 已采納 2018-11-22 12:26:31