Hadoop Hive - How can I 'add jar' for use with the Hive JDBC client?

Question

So, I have hdfs and hive working together. I also have the jdbc driver for Hive functioning so that I can make remote jdbc calls.

Now, I have added a Hive User Defined Function (UDF). It works great in the CLI... I even load the jar and associated function automatically via the .hiverc file. However, I cannot get this to work using the hive jdbc driver. I thought it would also use the .hiverc file (by default, located in /usr/lib/hive/bin/), but it does not seem to work. I also tried adding it via an 'add jar' SQL command as the first thing, but no matter where I put the jar file, I get an error in hive.log that the file cannot be found.

Anyone know how to do this? I am using the Cloudera Distribution (CDH3u2), which uses Hive-0.7.1.

Thanks, in advance.

Answer 1

According the Hive developer mailing list, in the current Hive version (0.9) there's no solution for this issue. To workarround this I used a connection factory class that properly register the jars and functions everytime a connection session is started. The code bellow works wonderfully:

    package com.rapidminer.operator.bigdata.runner.helpers;
import java.sql.*;

/** A Hive connection factory utility 
@author Marcelo Beckmann
*/
public class ConnectionFactory {

private static ConnectionFactory instance;

/** Basic attributes to make the connection*/
public String url = "jdbc:hive://localhost:10000/default";
public final String DRIVER = "org.apache.hadoop.hive.jdbc.HiveDriver";

public static ConnectionFactory getInstance(){
    if (instance==null)
        instance = new ConnectionFactory();
    return instance;
}
private ConnectionFactory()
{}
/**
 * Obtains a hive connection.
 * Warning! To use simultaneous connection from the Thrift server, you must change the
 * Hive metadata server from Derby to other database (MySQL for example).
 * @return
 * @throws Exception
 */
public Connection getConnection() throws Exception {

    Class.forName(DRIVER);

    Connection connection = DriverManager.getConnection(url,"","");

    runInitializationQueries(connection);
    return connection;
}

/**
 * Run initialization queries after the connection be obtained. This initialization was done in order
 * to workaround a known Hive bug (HIVE-657).
 * @throws SQLException
 */
private void runInitializationQueries(Connection connection) throws SQLException
{
    Statement stmt = null;
    try {
        //TODO Get the queries from a .hiverc file
        String[] args= new String[3];
        args[0]="add jar /home/hadoop-user/hive-0.9.0-bin/lib/hive-beckmann-functions.jar";  
        args[1]="create temporary function row_number as 'com.beckmann.hive.RowNumber'"; 
        args[2]="create temporary function sequence as 'com.beckmann.hive.Sequence'";
        for (String query:args)
        {
            stmt.execute(query);
        }
    }
    finally {
        if (stmt!=null)
            stmt.close();
    }

}
}

Answer 2

I use JDBC driver to connect to Hive as well. I scp my jar onto the master node of the cluster, which is also where Hive is installed and then use the absolute path to the file (on the master node) in my add jar command. I issue the add jar command via the JDBC driver just like any other HQL command.

Answer 3

我认为JDBC驱动程序使用Thrift，这意味着JAR可能需要在Thrift服务器（您在conn字符串中连接到的hive服务器）和hive类路径中。

Hadoop Hive - How can I 'add jar' for use with the Hive JDBC client?

Question

3 answers

solution1
3 2013-09-18 13:01:57

solution2
2 ACCPTED 2012-05-09 00:58:15

solution3
0 2012-01-17 17:26:20

Hadoop Hive - How can I 'add jar' for use with the Hive JDBC client?

Question

3 answers

solution1 3 2013-09-18 13:01:57

solution2 2 ACCPTED 2012-05-09 00:58:15

solution3 0 2012-01-17 17:26:20

solution1
3 2013-09-18 13:01:57

solution2
2 ACCPTED 2012-05-09 00:58:15

solution3
0 2012-01-17 17:26:20