简体   繁体   English

从Spark查询cassandra表而不使用案例类

[英]Query cassandra table from spark without using case classes

I am using datastax's connector for connecting to cassandra. 我正在使用datastax的连接器连接到cassandra。

Below is the code that I used, 下面是我使用的代码,

import org.apache.spark.sql.SQLContext
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import com.datastax.spark.connector._

val sqlContext = new SQLContext(sc)
val conf=new SparkConf(true)
.set("spark.cassandra.connection.host","127.0.0.1")
.set("spark.cassandra.auth.username","the_username")           
.set("spark.cassandra.auth.password","the_password")

val sc=new SparkContext("local","the_keyspace",conf)
val table_1 = sc.cassandraTable("the_keyspace","table_1")
val table_2 = sc.cassandraTable("the_keyspace","table_2")

Now, the way to expose this table as an RDD is by using a case class as a placeholder as below 现在,将此表公开为RDD的方法是使用案例类作为占位符,如下所示

case class Person(name: String, age: Int)
sc.cassandraTable[Person](“test”, “persons”).registerAsTable(“persons”)

This works fine, but I have around 50+ columns in each table and it is a real pain to type them out in a case class and also identifying their types. 这很好用,但是每个表中都有大约50多个列,在案例类中输入它们并标识它们的类型确实很麻烦。

Is there a way to overcome this ? 有办法克服吗? I am used to getting the csv file as a table using databricks-csv and I can register them as tables and run queries on them without using a case class placeholder, is there something similar for my use case here. 我习惯于使用databricks-csv将csv文件作为表获取,并且可以将它们注册为表并在不使用case类占位符的情况下对它们运行查询,这里的用例是否有类似之处。

If there are none, it would be helpful if there are some generators that I can use to auto-generate these case classes. 如果没有,则可以使用一些生成器来自动生成这些案例类将很有帮助。

You can create data frame directly: 您可以直接创建数据框:

val df = sqlContext
   .read.format("org.apache.spark.sql.cassandra")
   .options(Map("keyspace" -> "test", "table" -> "persons"))
   .load()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM