简体   繁体   English

使用 Java 从 Hive 表中读取记录(通过 Hivemetastoreclient 或 Hcatalog 或 WebHcat)

[英]Read records from Hive Table using Java (through Hivemetastoreclient or Hcatalog or WebHcat)

One Hive table t_event is in demo_read database.一个 Hive 表 t_event 在 demo_read 数据库中。 Table has more than 100,000 records.How to read records through java API.表有超过100,000条记录。如何通过java API读取记录。

You can use Hive JDBC driver to connect to Hive tables.您可以使用 Hive JDBC 驱动程序连接到 Hive 表。 It's okay for testing or POC with the code below but I recommend moving your end tables to HBase (check Phoenix) or MongoDB or some sort of Relational based table which have low latency.可以使用下面的代码进行测试或 POC,但我建议将您的端表移动到 HBase(检查 Phoenix)或 MongoDB 或某种具有低延迟的基于关系的表。

You could as well use dynamic partitions or some sort of cluster technique in Hive for better performance.您也可以在 Hive 中使用动态分区或某种集群技术以获得更好的性能。 You can use the following code, I haven't tested it (use it as a sample).你可以使用下面的代码,我没有测试过(用作示例)。

 import java.sql.*;

 public class HiveDB {
 
 public static final String HIVE_JDBC_DRIVER = "org.apache.hadoop.hive.jdbc.HiveDriver";
 public static final String HIVE_JDBC_EMBEDDED_CONNECTION = "jdbc:hive://";
 private static String driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";
 
 private Statement getConnection() throws ClassNotFoundException,
            SQLException {
        Class.forName(HIVE_JDBC_DRIVER);
        Connection connection = DriverManager.getConnection(
                HIVE_JDBC_EMBEDDED_CONNECTION, "", "");

        Statement statement = connection.createStatement();
        return statement;
    }
    
    public static void main(String[] args) {
       HiveDB hiveDB = new HiveDB();
       try {
         Statement statement = hiveDB.getConnection();
         //print each row
         ResultSet resultSet = statement.executeQuery("select * from demo_read.t_event");
         int columns = resultSet.getMetaData().getColumnCount();
         while (resultSet.next()) {
           for ( int i = 0 ; i < columns; ++i) {
              System.out.print(resultSet.getString(i + 1) + " " );
              if (i == 100) break; //print up to 100th rows
           }
           System.out.println();
         }
        statement.close(); //close statement
      } catch (ClassNotFoundException e) {
         //
      } catch (SQLException e) {
         //
      }
    }
    
 }

Well, actually you don't want to read all that data.好吧,实际上您不想读取所有这些数据。 You need to transform it and upload into some database or (if data is relatively small) to export it into common format (CSV, JSON, etc.).您需要将其转换并上传到某个数据库或(如果数据相对较小)将其导出为通用格式(CSV、JSON 等)。

You could transform data with Hive CLI, WebHCat or JDBC Hive driver.您可以使用 Hive CLI、WebHCat 或 JDBC Hive 驱动程序转换数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM