简体   繁体   English

Spark:java.lang.RuntimeException:[1.226]故障:预期的标识符

[英]Spark: java.lang.RuntimeException: [1.226] failure: identifier expected

UPDATED 3/11 更新3/11

I am running into an error I am receiving from Spark based on my SparkSQL query. 我遇到了基于我的SparkSQL查询从Spark接收到的错误。 I am running spark 1.2.1 version. 我正在运行spark 1.2.1版本。 I tried checking my query against several of the answers I found on stack overflow, but I am unable to diagnose exactly what the issue is here. 我尝试针对在堆栈溢出时找到的几个答案检查查询,但是我无法确切诊断出问题所在。

The error: 错误:

Application Failed...java.lang.RuntimeException: [1.226] failure: identifier expected

SELECT A_HOSTNAME, A_MODEL FROM (SELECT A_HOSTNAME, A_MODEL, COUNT(*) FROM (SELECT A_HOSTNAME, A_IF_DESC, A_MODEL FROM TOPOLOGY WHERE IS_PROD > 0 AND A_TYPE = 'GWR' AND Z_TYPE = 'LCR' GROUP BY A_HOSTNAME, A_IF_DESC, A_MODEL) GROUP BY A_HOSTNAME, A_MODEL HAVING COUNT(*) > 1) GROUP BY A_HOSTNAME, A_MODEL

Code: modelTopology.java 程式码:modelTopology.java

public class modelTopology implements Serializable {
//Variables are here + getters and setters
}

Code: Create JavaSchemaRDD from RDD as modelTOpologySchema 代码:从RDD创建JavaSchemaRDD作为modelTOpologySchema

 JavaRDD<modelTopology> MODEL_TOPOLOGYRDD = TopologyRDD.map(
                new Function<Object[], modelTopology>() {
                    public modelTopology call(Object[] line) throws Exception {
                        modelTopology toporow = new modelTopology();
                        toporow.setA_TYPE(line[0].toString().trim());
                        toporow.setZ_TYPE(line[1].toString().trim());
                        toporow.setA_CLLI(line[2].toString().trim());
                        toporow.setZ_CLLI(line[3].toString().trim());
                        toporow.setA_HOSTNAME(line[4].toString().trim());
                        toporow.setZ_HOSTNAME(line[5].toString().trim());
                        toporow.setA_LOCATION(line[6].toString().trim());
                        toporow.setA_LOC_TYPE(line[7].toString().trim());
                        toporow.setZ_LOCATION(line[8].toString().trim());
                        toporow.setZ_LOC_TYPE(line[9].toString().trim());
                        toporow.setA_SHELF(line[10].toString().trim());
                        toporow.setA_SLOT(line[11].toString().trim());
                        toporow.setA_CARD(line[12].toString().trim());
                        toporow.setA_PORT(line[13].toString().trim());
                        toporow.setA_INTERFACE(line[14].toString().trim());
                        toporow.setA_IF_DESC(line[15].toString().trim());
                        toporow.setZ_SHELF(line[16].toString().trim());
                        toporow.setZ_SLOT(line[17].toString().trim());
                        toporow.setZ_CARD(line[18].toString().trim());
                        toporow.setZ_PORT(line[19].toString().trim());
                        toporow.setZ_INTERFACE(line[20].toString().trim());
                        toporow.setZ_IF_DESC(line[21].toString().trim());
                        toporow.setA_CARD_NAME(line[22].toString().trim());
                        toporow.setZ_CARD_NAME(line[23].toString().trim());
                        toporow.setPHY_CIRCUIT_ID(line[24].toString().trim());
                        toporow.setLAG_CIRCUIT_ID(line[25].toString().trim());
                        toporow.setPHY_CIRCUIT_ALIAS(line[26].toString().trim());
                        toporow.setA_VENDOR(line[27].toString().trim());
                        toporow.setA_MODEL(line[28].toString().trim());
                        toporow.setA_TECHNOLOGY(line[29].toString().trim());
                        toporow.setZ_VENDOR(line[30].toString().trim());
                        toporow.setZ_MODEL(line[31].toString().trim());
                        toporow.setZ_TECHNOLOGY(line[32].toString().trim());
                        toporow.setA_EH_ELEMENT_ID(line[33].toString().trim());
                        toporow.setA_EH_MACHINE_ID(line[34].toString().trim());
                        toporow.setZ_EH_ELEMENT_ID(line[35].toString().trim());
                        toporow.setZ_EH_MACHINE_ID(line[36].toString().trim());
                        toporow.setA_EH_SPEED(line[37].toString().trim());
                        toporow.setZ_EH_SPEED(line[38].toString().trim());
                        toporow.setA_EH_SPEED1(line[39].toString().trim());
                        toporow.setZ_EH_SPEED1(line[40].toString().trim());
                        toporow.setA_EH_EHEALTH_DOMAIN(line[41].toString().trim());
                        toporow.setZ_EH_EHEALTH_DOMAIN(line[42].toString().trim());
                        toporow.setA_MRTG_HOSTID(line[43].toString().trim());
                        toporow.setA_MRTG_GRPID(line[44].toString().trim());
                        toporow.setA_MRTG_IFID(line[45].toString().trim());
                        toporow.setZ_MRTG_HOSTID(line[46].toString().trim());
                        toporow.setZ_MRTG_GRPID(line[47].toString().trim());
                        toporow.setZ_MRTG_IFID(line[48].toString().trim());
                        toporow.setA_MGMT_IP(line[49].toString().trim());
                        toporow.setZ_MGMT_IP(line[50].toString().trim());
                        toporow.setA_IF_INDEX(line[51].toString().trim());
                        toporow.setZ_IF_INDEX(line[52].toString().trim());
                        toporow.setIS_PROD(line[53].toString().trim());
                        toporow.setTOPOLOGY_KEY(line[54].toString().trim());
                        toporow.setCOMMIT_TS(line[55].toString().trim());

                        return toporow;
                    }
                });

        JavaSchemaRDD schemaTopology = sqlContext.applySchema(MODEL_TOPOLOGYRDD, modelTopology.class);
        schemaTopology.registerAsTable("TOPOLOGY");

        JavaSchemaRDD FILTERED_TOPOLOGY = sqlContext.sql("SELECT A_HOSTNAME, A_MODEL FROM (SELECT A_HOSTNAME, A_MODEL, COUNT(*) FROM (SELECT A_HOSTNAME, A_IF_DESC, A_MODEL FROM TOPOLOGY WHERE IS_PROD > 0 AND A_TYPE = 'GWR' AND Z_TYPE = 'LCR' GROUP BY A_HOSTNAME, A_IF_DESC, A_MODEL) GROUP BY A_HOSTNAME, A_MODEL HAVING COUNT(*) > 1) GROUP BY A_HOSTNAME, A_MODEL").cache();

JavaSchemaRDD Layout JavaSchemaRDD布局

root
 |-- COMMIT_TS: string (nullable = true)
 |-- IS_PROD: string (nullable = true)
 |-- LAG_CIRCUIT_ID: string (nullable = true)
 |-- PHY_CIRCUIT_ALIAS: string (nullable = true)
 |-- PHY_CIRCUIT_ID: string (nullable = true)
 |-- TOPOLOGY_KEY: string (nullable = true)
 |-- a_CARD: string (nullable = true)
 |-- a_CARD_NAME: string (nullable = true)
 |-- a_CLLI: string (nullable = true)
 |-- a_EH_EHEALTH_DOMAIN: string (nullable = true)
 |-- a_EH_ELEMENT_ID: string (nullable = true)
 |-- a_EH_MACHINE_ID: string (nullable = true)
 |-- a_EH_SPEED: string (nullable = true)
 |-- a_EH_SPEED1: string (nullable = true)
 |-- a_HOSTNAME: string (nullable = true)
 |-- a_IF_DESC: string (nullable = true)
 |-- a_IF_INDEX: string (nullable = true)
 |-- a_INTERFACE: string (nullable = true)
 |-- a_LOCATION: string (nullable = true)
 |-- a_LOC_TYPE: string (nullable = true)
 |-- a_MGMT_IP: string (nullable = true)
 |-- a_MODEL: string (nullable = true)
 |-- a_MRTG_GRPID: string (nullable = true)
 |-- a_MRTG_HOSTID: string (nullable = true)
 |-- a_MRTG_IFID: string (nullable = true)
 |-- a_PORT: string (nullable = true)
 |-- a_SHELF: string (nullable = true)
 |-- a_SLOT: string (nullable = true)
 |-- a_TECHNOLOGY: string (nullable = true)
 |-- a_TYPE: string (nullable = true)
 |-- a_VENDOR: string (nullable = true)
 |-- z_CARD: string (nullable = true)
 |-- z_CARD_NAME: string (nullable = true)
 |-- z_CLLI: string (nullable = true)
 |-- z_EH_EHEALTH_DOMAIN: string (nullable = true)
 |-- z_EH_ELEMENT_ID: string (nullable = true)
 |-- z_EH_MACHINE_ID: string (nullable = true)
 |-- z_EH_SPEED: string (nullable = true)
 |-- z_EH_SPEED1: string (nullable = true)
 |-- z_HOSTNAME: string (nullable = true)
 |-- z_IF_DESC: string (nullable = true)
 |-- z_IF_INDEX: string (nullable = true)
 |-- z_INTERFACE: string (nullable = true)
 |-- z_LOCATION: string (nullable = true)
 |-- z_LOC_TYPE: string (nullable = true)
 |-- z_MGMT_IP: string (nullable = true)
 |-- z_MODEL: string (nullable = true)
 |-- z_MRTG_GRPID: string (nullable = true)
 |-- z_MRTG_HOSTID: string (nullable = true)
 |-- z_MRTG_IFID: string (nullable = true)
 |-- z_PORT: string (nullable = true)
 |-- z_SHELF: string (nullable = true)
 |-- z_SLOT: string (nullable = true)
 |-- z_TECHNOLOGY: string (nullable = true)
 |-- z_TYPE: string (nullable = true)
 |-- z_VENDOR: string (nullable = true)

EDIT 3/11/2016 Edited my sql statement per subquery feedback received below. 编辑3/11/2016编辑了我的sql语句,下面收到每个子查询反馈。

   JavaSchemaRDD FILTERED_TOPOLOGY = sqlContext.sql("SELECT t2.A_HOSTNAME, t2.A_MODEL FROM " +
                    "(SELECT t1.A_HOSTNAME, t1.A_MODEL, COUNT(*) FROM " +
                    "(SELECT A_HOSTNAME, A_IF_DESC, A_MODEL " +
                    "FROM TOPOLOGY WHERE IS_PROD > 0 AND A_TYPE = 'GWR' AND Z_TYPE = 'LCR' " +
                    "GROUP BY A_HOSTNAME, A_IF_DESC, A_MODEL) t1 " +
                    "GROUP BY A_HOSTNAME, A_MODEL HAVING COUNT(*) > 1) t2 " +
                    "GROUP BY t2.A_HOSTNAME, t2.A_MODEL").cache();

Error: 错误:

Application Failed...org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 't2.A_HOSTNAME.,'t2.A_MODEL.,'t2.A_HOSTNAME.,'t2.A_MODEL., tree:
'Aggregate ['t2.A_HOSTNAME.,'t2.A_MODEL.], ['t2.A_HOSTNAME.,'t2.A_MODEL.]
 'Subquery t2
  'Filter (COUNT(1) > CAST(1, LongType))
   'Aggregate ['A_HOSTNAME,'A_MODEL], ['t1.A_HOSTNAME.,'t1.A_MODEL.,COUNT(1) AS c2#59L]
    'Subquery t1
     'Aggregate ['A_HOSTNAME,'A_IF_DESC,'A_MODEL], ['A_HOSTNAME,'A_IF_DESC,'A_MODEL]
      'Filter (((CAST(IS_PROD#1, DoubleType) > CAST(0, DoubleType)) && ('A_TYPE = GWR)) && ('Z_TYPE = LCR))
       Subquery TOPOLOGY
        LogicalRDD [COMMIT_TS#0,IS_PROD#1,LAG_CIRCUIT_ID#2,PHY_CIRCUIT_ALIAS#3,PHY_CIRCUIT_ID#4,TOPOLOGY_KEY#5,a_CARD#6,a_CARD_NAME#7,a_CLLI#8,a_EH_EHEALTH_DOMAIN#9,a_EH_ELEMENT_ID#10,a_EH_MACHINE_ID#11,a_EH_SPEED#12,a_EH_SPEED1#13,a_HOSTNAME#14,a_IF_DESC#15,a_IF_INDEX#16,a_INTERFACE#17,a_LOCATION#18,a_LOC_TYPE#19,a_MGMT_IP#20,a_MODEL#21,a_MRTG_GRPID#22,a_MRTG_HOSTID#23,a_MRTG_IFID#24,a_PORT#25,a_SHELF#26,a_SLOT#27,a_TECHNOLOGY#28,a_TYPE#29,a_VENDOR#30,z_CARD#31,z_CARD_NAME#32,z_CLLI#33,z_EH_EHEALTH_DOMAIN#34,z_EH_ELEMENT_ID#35,z_EH_MACHINE_ID#36,z_EH_SPEED#37,z_EH_SPEED1#38,z_HOSTNAME#39,z_IF_DESC#40,z_IF_INDEX#41,z_INTERFACE#42,z_LOCATION#43,z_LOC_TYPE#44,z_MGMT_IP#45,z_MODEL#46,z_MRTG_GRPID#47,z_MRTG_HOSTID#48,z_MRTG_IFID#49,z_PORT#50,z_SHELF#51,z_SLOT#52,z_TECHNOLOGY#53,z_TYPE#54,z_VENDOR#55], MapPartitionsRDD[2] at mapPartitions at JavaSQLContext.scala:102

Your SQL is unnecessarily complicated and is missing aliases for the subqueries. 您的SQL不必要地复杂,并且缺少子查询的别名。

Here is a simplified query (on multiple lines for readability): 这是一个简化的查询(为便于阅读,多行显示):

SELECT A_HOSTNAME, A_MODEL FROM
  (SELECT A_HOSTNAME, A_IF_DESC, A_MODEL
  FROM TOPOLOGY
  WHERE IS_PROD > 0 AND A_TYPE = 'GWR' AND Z_TYPE = 'LCR') D
GROUP BY A_HOSTNAME, A_MODEL
HAVING COUNT(*) > 1;

This returns hostname and model for devices having more than one interface, which is what I understand you want to do. 这将返回具有多个接口的设备的主机名和型号,据我了解,您要这样做。

I don't understand you query, as it seems way too complicated. 我不明白您的查询,因为它看起来太复杂了。 Is the following one the same : 以下内容是否相同:

SELECT A_HOSTNAME, A_MODEL FROM TOPOLOGY 
WHERE IS_PROD > 0 AND A_TYPE = 'GWR' AND Z_TYPE = 'LCR' 
GROUP BY A_HOSTNAME, A_MODEL
HAVING COUNT(*) > 1

And as asked by zero323 why are you running a very old version of Spark, can you try with at least a 1.5.x version (or the new 1.6.1 version) 并按照zero323的要求,为什么要运行非常旧的Spark版本,能否至少使用1.5.x版本(或新的1.6.1版本)进行尝试?

Regards, 问候,

Loïc 卢瓦克

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM