Google Cloud Dataflow access Hive JDBC with Kerberos authentication

Question

I am trying to use Beam Java 2.37.0 on the Google Cloud Dataflow to access Hive JDBC with Kerberos authentication enabled for data extraction. The connection works fine in my local machine, and I can extract the data. However, it ran into errors when I was trying to build the Dataflow job in GCP. It says that it Can't get Kerberos realm. I stored the krb5.conf file and the keytab file in the GCP storage bucket, and I am trying to pass them in via Pipeline options. See the below sample code. Am I missing anything here? Is there any way for users to specify environment variables and/or Java system properties when deploying a pipeline to Dataflow such that those settings are in effect on all workers?

        Pipeline p = Pipeline.create(options);
        
        System.setProperty("java.security.krb5.conf", options.getKrb5().get());
        org.apache.hadoop.conf.Configuration conf = new org.apache.hadoop.conf.Configuration();
        conf.set("hadoop.security.authentication", "Kerberos");
        UserGroupInformation.setConfiguration(conf);

        //Call the Hadoop UserGroupInformation API:
        try {
            UserGroupInformation.loginUserFromKeytab(options.getKeytabUser().get(), options.getKeytab().get());
        } catch (IOException e) {
            e.printStackTrace();
        }

        p.apply(JdbcIO.<String>read()
                .withDataSourceConfiguration(JdbcIO.DataSourceConfiguration.create("org.apache.hive.jdbc.HiveDriver", options.getHiveJDBCURL().get()))
                .withQuery(options.getSqlinput())
..............

)

Answer 1

You can completely customize your environment by using custom containers .

Google Cloud Dataflow access Hive JDBC with Kerberos authentication

Question

1 answers

solution1
0 2022-03-23 00:47:59

Google Cloud Dataflow access Hive JDBC with Kerberos authentication

Question

1 answers

solution1 0 2022-03-23 00:47:59

solution1
0 2022-03-23 00:47:59