简体   繁体   中英

Google Cloud Dataflow access Hive JDBC with Kerberos authentication

I am trying to use Beam Java 2.37.0 on the Google Cloud Dataflow to access Hive JDBC with Kerberos authentication enabled for data extraction. The connection works fine in my local machine, and I can extract the data. However, it ran into errors when I was trying to build the Dataflow job in GCP. It says that it Can't get Kerberos realm. I stored the krb5.conf file and the keytab file in the GCP storage bucket, and I am trying to pass them in via Pipeline options. See the below sample code. Am I missing anything here? Is there any way for users to specify environment variables and/or Java system properties when deploying a pipeline to Dataflow such that those settings are in effect on all workers?

        Pipeline p = Pipeline.create(options);
        
        System.setProperty("java.security.krb5.conf", options.getKrb5().get());
        org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
        conf.set("hadoop.security.authentication", "Kerberos");
        UserGroupInformation.setConfiguration(conf);

        //Call the Hadoop UserGroupInformation API:
        try {
            UserGroupInformation.loginUserFromKeytab(options.getKeytabUser().get(), options.getKeytab().get());
        } catch (IOException e) {
            e.printStackTrace();
        }

        p.apply(JdbcIO.<String>read()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("org.apache.hive.jdbc.HiveDriver", options.getHiveJDBCURL().get()))
                .withQuery(options.getSqlinput())
..............

)

You can completely customize your environment by using custom containers .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM