简体   繁体   中英

Unable to load spark sql datasource for HBase

I want to use Spark SQL to fetch data from HBase table. But I get classNotFoundException while creating DataFrame. Here is my exception.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/types/NativeType
    at org.apache.hadoop.hbase.spark.DefaultSource$$anonfun$generateSchemaMappingMap$1.apply(DefaultSource.scala:127)
    at org.apache.hadoop.hbase.spark.DefaultSource$$anonfun$generateSchemaMappingMap$1.apply(DefaultSource.scala:116)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
    at org.apache.hadoop.hbase.spark.DefaultSource.generateSchemaMappingMap(DefaultSource.scala:116)
    at org.apache.hadoop.hbase.spark.DefaultSource.createRelation(DefaultSource.scala:97)
    at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
    at com.apache.spark.gettingStarted.SparkSQLOnHBaseTable.createTableAndPutData(SparkSQLOnHBaseTable.java:146)
    at com.apache.spark.gettingStarted.SparkSQLOnHBaseTable.main(SparkSQLOnHBaseTable.java:154)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.types.NativeType
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 14 more

Have any of you guys faced such issues? How did you solve it?

Here is my code

// initializing spark context
    SparkConf sconf = new SparkConf().setMaster("local").setAppName("Test");
    // SparkContext sc = new SparkContext("local", "test", sconf);
    Configuration conf = HBaseConfiguration.create();
    JavaSparkContext jsc = new JavaSparkContext(sconf);
    try {
        HBaseAdmin.checkHBaseAvailable(conf);
        System.out.println("HBase is running");
    } catch (ServiceException e) {
        System.out.println("HBase is not running");
        e.printStackTrace();
    }
    SQLContext sqlContext = new SQLContext(jsc);

    String sqlMapping = "KEY_FIELD STRING :key" + " sql_city STRING personal:city" + ","
            + "sql_name STRING personal:name" + "," + "sql_designation STRING professional:designation" + ","
            + "sql_salary STRING professional:salary";

    HashMap<String, String> colMap = new HashMap<String, String>();
    colMap.put("hbase.columns.mapping", sqlMapping);
    colMap.put("hbase.table", "emp");

    // DataFrame dfJail =
    DataFrame df = sqlContext.read().format("org.apache.hadoop.hbase.spark").options(colMap).load();
    //DataFrame df = sqlContext.load("org.apache.hadoop.hbase.spark", colMap);

    // This is useful when issuing SQL text queries directly against the
    // sqlContext object.
    df.registerTempTable("temp_emp");

    DataFrame result = sqlContext.sql("SELECT count(*) from temp_emp");
    System.out.println("df  " + df);
    System.out.println("result " + result);

here is pom.xml dependencies

<dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.10</artifactId>
        <version>1.6.0</version>
    </dependency>

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.10</artifactId>
        <version>1.6.1</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-client</artifactId>
        <version>1.1.3</version>
    </dependency>

    <dependency>
        <groupId>org.apache.hbase</groupId>
        <artifactId>hbase-spark</artifactId>
        <version>2.0.0-SNAPSHOT</version>
    </dependency>
</dependencies>

NativeType doesn't exist Anymore: (nor dataTypes.scala)

Class not available in package

It used to exist in Spark 1.3.1 inside dataTypes.scala.

You can see here that NativeType became protected:

Commit that makes NativeType protected

You might be using an old example.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM