[英]How to run Hive Sql query's containing “select count(*)” and “group by” clauses in Hive embedded mode?
How can I run this query (1) in Hive embedded mode 如何在Hive嵌入式模式下运行此查询(1)
select product,count(*) as cnt from hive_bigpetstore_etl group by product
in the maven console I get a InvocationTargetException
exception 在Maven控制台中,我收到一个
InvocationTargetException
异常
in the Hive log file I find 在Hive日志文件中,我发现
java.lang.Exception: java.lang.NullPointerException
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.exec.Utilities.setColumnTypeList(Utilities.java:2033)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushFilters(HiveInputFormat.java:351)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:432)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.pushProjectionsAndFilters(HiveInputFormat.java:374)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:540)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:191)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
this is a typical row in the input data 这是输入数据中的典型行
BigPetStore,storeCode_AK,1 russell,baird,Sun Dec 21 11:57:31 PST 1969,20.1,antelope-caller
the data loads into the table successfully as if I change (1) to (2) 数据成功加载到表中,就好像我将(1)更改为(2)
`select* from hive_bigpetstore_etl`
it returns a correct ResultSet with all the data 它返回所有数据的正确ResultSet
I have checked everything is on the class-path, there are no exceptions like class not found, the hive and hadoop home env variables are set and checked with printenv
if I run (1) against a standalone Hive/Thrift query (1) runs with no exceptions, I get the exceptions only in embedded mode. 我已经检查了一切都在类路径上,没有异常,例如找不到类,设置了蜂巢和hadoop home env变量,并在我针对独立的Hive / Thrift查询(1)运行(1)的情况下使用
printenv
进行了检查没有例外,我仅在嵌入式模式下获得例外。
How can I run the select count(*)
and group by
如何运行
select count(*)
和group by
clause's in Hive embedded mode? 子句处于Hive嵌入式模式?
Looking at the hive code from branch-0.11 : 看从branch-0.11的蜂巢代码:
2023 public static void setColumnTypeList(JobConf jobConf, Operator op) {
2024 RowSchema rowSchema = op.getSchema();
2025 : if (rowSchema == null) {
2026 return;
2027 }
2028 StringBuilder columnTypes = new StringBuilder();
2029 for (ColumnInfo colInfo : rowSchema.getSignature()) {
2030 if (columnTypes.length() > 0) {
2031 columnTypes.append(",");
2032 }
2033 ----------> columnTypes.append(colInfo.getType().getTypeName());
2034 }
2035 String columnTypesString = columnTypes.toString();
2036 jobConf.set(serdeConstants.LIST_COLUMN_TYPES, columnTypesString);
2037 }
Probably the answer is that it for some reason, the colInfo.getType() is returning null. 可能的答案是由于某种原因,colInfo.getType()返回null。 The question then becomes "why".
然后问题变成“为什么”。 Adding some more color to the question (ie can you reproduce this error with any count(*) query) might shed some light on that.
为问题添加更多颜色(即,您可以使用任何count(*)查询重现此错误)可能对此有所启发。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.