简体   繁体   English

用于Cassandra的Spark2会话,sql查询

[英]Spark2 session for Cassandra , sql queries

In Spark-2.0 what is the best way to create a Spark session. 在Spark-2.0中,创建Spark会话的最佳方法是什么。 Because in both Spark-2.0 and Cassandra- the APIs have been reworked, essentially deprecating the SqlContext (and also CassandraSqlContext). 因为在Spark-2.0和Cassandra中,API都经过了重新设计,基本上不赞成使用SqlContext(以及CassandraSqlContext)。 So for executing SQL- either I create a Cassandra Session (com.datastax.driver.core.Session) and use execute( " ") . 因此,为了执行SQL,我创建了一个Cassandra Session (com.datastax.driver.core.Session) and use execute( " ") Or I have to create a SparkSession (org.apache.spark.sql.SparkSession) and execute sql(String sqlText) method. 或者我必须创建一个SparkSession (org.apache.spark.sql.SparkSession) and execute sql(String sqlText)方法。

I don't know the SQL limitations of either - can someone explain. 我不知道两者的SQL限制 - 有人可以解释一下。

Also if I have to create the SparkSession - how do I do it- couldn't find any suitable example. 此外,如果我必须创建SparkSession - 我该怎么做 - 找不到任何合适的例子。 With APIs getting reworked the old examples don't work. 随着API的重新设计,旧示例不起作用。 I was going thru this code sample- DataFrames - not clear what sql context is being used here (is that the right approach going forward.) (For some reason deprecated APIs are not even compiling - need to check my eclipse settings) 我通过这个代码示例 - DataFrames - 不清楚这里使用的是什么sql上下文(正确的方法是前进的。)(由于某些原因,弃用的API甚至没有编译 - 需要检查我的eclipse设置)

Thanks 谢谢

You would need Cassandra Session for create/drop keyspace and table from Cassandra DB. 您需要Cassandra Session来创建/删除Cassandra DB中的密钥空间和表。 In Spark application, in order to create Cassandra Session you need to pass SparkConf to CassandraConnector. 在Spark应用程序中,为了创建Cassandra Session,您需要将SparkConf传递给CassandraConnector。 In Spark 2.0 you can do it like below. 在Spark 2.0中,您可以像下面这样做。

 SparkSession spark = SparkSession
              .builder()
              .appName("SparkCassandraApp")
              .config("spark.cassandra.connection.host", "localhost")
              .config("spark.cassandra.connection.port", "9042")
              .master("local[2]")
              .getOrCreate();

CassandraConnector connector = CassandraConnector.apply(spark.sparkContext().conf());
Session session = connector.openSession();
session.execute("CREATE TABLE mykeyspace.mytable(id UUID PRIMARY KEY, username TEXT, email TEXT)");

If you have existing Dataframe then you can create table in Cassandra using DataFrameFunctions.createCassandraTable(Df) as well. 如果您有现有的Dataframe,那么您也可以使用DataFrameFunctions.createCassandraTable(Df)在Cassandra中创建表。 See api details here . 请参阅此处的 api详细信息

You can read data from Cassandra DB using api provided by spark-cassandra-connector like below. 您可以使用spark-cassandra-connector提供的api从Cassandra DB读取数据,如下所示。

Dataset<Row> dataset = spark.read().format("org.apache.spark.sql.cassandra")
            .options(new HashMap<String, String>() {
                {
                    put("keyspace", "mykeyspace");
                    put("table", "mytable");
                }
            }).load();

dataset.show(); 

You can use SparkSession.sql() method to run query on temporary table created on Dataframe returned by spark cassandra connector like below. 您可以使用SparkSession.sql()方法在spark cassandra连接器返回的Dataframe上创建的临时表上运行查询,如下所示。

dataset.createOrReplaceTempView("usertable");
Dataset<Row> dataset1 = spark.sql("select * from usertable where username = 'Mat'");
dataset1.show();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM