简体   繁体   English

如何在Hive嵌入式模式下运行包含“ select count(*)”和“ group by”子句的Hive Sql查询?

[英]How to run Hive Sql query's containing “select count(*)” and “group by” clauses in Hive embedded mode?

How can I run this query (1) in Hive embedded mode 如何在Hive嵌入式模式下运行此查询(1)

select product,count(*) as cnt from hive_bigpetstore_etl group by product  

in the maven console I get a InvocationTargetException exception 在Maven控制台中,我收到一个InvocationTargetException异常

in the Hive log file I find 在Hive日志文件中,我发现

java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.Utilities.setColumnTypeList(Utilities.java:2033)
at       org.apache.hadoop.hive.ql.io.HiveInputFormat.pushFilters(HiveInputFormat.java:351)
at     org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:432)
at     org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
at     org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:191)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

this is a typical row in the input data 这是输入数据中的典型行

BigPetStore,storeCode_AK,1 russell,baird,Sun Dec 21 11:57:31 PST 1969,20.1,antelope-caller

the data loads into the table successfully as if I change (1) to (2) 数据成功加载到表中,就好像我将(1)更改为(2)

`select* from hive_bigpetstore_etl`  

it returns a correct ResultSet with all the data 它返回所有数据的正确ResultSet

I have checked everything is on the class-path, there are no exceptions like class not found, the hive and hadoop home env variables are set and checked with printenv if I run (1) against a standalone Hive/Thrift query (1) runs with no exceptions, I get the exceptions only in embedded mode. 我已经检查了一切都在类路径上,没有异常,例如找不到类,设置了蜂巢和hadoop home env变量,并在我针对独立的Hive / Thrift查询(1)运行(1)的情况下使用printenv进行了检查没有例外,我仅在嵌入式模式下获得例外。

How can I run the select count(*) and group by 如何运行select count(*)group by

clause's in Hive embedded mode? 子句处于Hive嵌入式模式?

Looking at the hive code from branch-0.11 : 看从branch-0.11的蜂巢代码:

2023   public static void setColumnTypeList(JobConf jobConf, Operator op) {
2024     RowSchema rowSchema = op.getSchema();
2025 :    if (rowSchema == null) {
2026       return;
2027     }
2028     StringBuilder columnTypes = new StringBuilder();
2029     for (ColumnInfo colInfo : rowSchema.getSignature()) {
2030       if (columnTypes.length() > 0) {
2031         columnTypes.append(",");
2032       }
2033 ----------> columnTypes.append(colInfo.getType().getTypeName());
2034     }
2035     String columnTypesString = columnTypes.toString();
2036     jobConf.set(serdeConstants.LIST_COLUMN_TYPES, columnTypesString);
2037   }

Probably the answer is that it for some reason, the colInfo.getType() is returning null. 可能的答案是由于某种原因,colInfo.getType()返回null。 The question then becomes "why". 然后问题变成“为什么”。 Adding some more color to the question (ie can you reproduce this error with any count(*) query) might shed some light on that. 为问题添加更多颜色(即,您可以使用任何count(*)查询重现此错误)可能对此有所启发。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM