I have a spark cluster at AWS EMR and try to start the following code with thrift-server:
...
JavaSparkContext jsc = new JavaSparkContext(SparkContext.getOrCreate());
HiveContext hiveContext = new HiveContext(jsc);
JavaRDD<Person> people = jsc.textFile("people.txt").map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
...
}
});
DataFrame schemaPeople = hiveContext.createDataFrame(people, Person.class);
schemaPeople.registerTempTable("people_temp");
schemaPeople.saveAsTable("people");
HiveThriftServer2.startWithContext(hiveContext);
...
I run this code with a command: sudo ./sbin/start-thriftserver.sh --jars /home/ec2-user/some.jar --class spark.jobs.thrift.ThriftServerInit
After thrift server was started I connect to it with beeline: !connect jdbc:hive2://localhost:10001
, run show tables;
and get a result:
+--------------+--------------+--+
| tableName | isTemporary |
+--------------+--------------+--+
| people | false |
+--------------+--------------+--+
I expect to see a temporary table people_temp
too. Why people_temp
is absent?
On latest Spark 1.6.* I found that needed to explicitly set the configuration flag to single session to make it work with temp tables: spark.sql.hive.thriftServer.singleSession=true Take a look at the migration guide http://spark.apache.org/docs/latest/sql-programming-guide.html#upgrading-from-spark-sql-15-to-16 Hope this helps
Rod
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.